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INTRODUCTION 


The Problem of This Study 

Today the school is being increasingly 
called upon to teach resistance to propaganda. 
Social studies teachers in particular are ex- 
pected to develop in their pupils habits of 
critical thinking with respect to controversial 
social issues. If social studies teachers aim to 
immunize their pupils against propaganda, 
they must organize effective curricular mate- 
rials for classroom use. This study reports 
the results of one attempt to organize such 
curricular materials, and to determine their 
effectiveness experimentally. 

The purposes of this investigation were to 
determine: (1) the effectiveness of a unit of 
instruction designed for use in teaching high 
school social studies pupils to resist propa- 
ganda, (2) the degree to which knowledge 
concerning a selected controversial social issue 
is associated with attitude toward that issue 
and also with shift of attitude toward that 
issue stimulated by reading a_ selection 
designed to shift that attitude, and (3) the 
degree to which intelligence is associated with 
attitude toward the selected controversial 
social issue and also with shift of attitude 
toward that issue stimulated as described 
above. 


Unique Aspects of This Study 

While separate experiments have dealt 
with individual phases of the problem studied 
in this investigation, there appears to be no 
study which has made the same approach as 
this one or which has included all three phases 
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ment of the requirements for the degree of Doctor of Phil- 
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of the problem in the same study. Biddle,’ 
like the present writer, attempted to teach 
critical thinking directly by leading pupils to 
study the forms in which propaganda com- 
monly appears. His study, while closely re- 
lated to this experiment, varied from it in 
several particulars. For instance, he ex- 
pressed the feeling that a clear measure of the 
degree of understanding of the lessons pos- 
sessed by the pupils would have helped him 
in the interpretation of his study; he did not 
make a direct study of opinions or attitudes; 
and he gave prior instruction to the experi- 
mental group concerning the propaganda de- 
vices commonly employed in connection with 
the very social issue which he used as the 
basis of his testing. Anderson’ has suggested 
the desirability of a direct study of opinions. 
Studies by Annis,’ Campbell,* Bateman and 
Remmers,’ Marple,® McConnell,’ Peregrine,* 
Remmers,® and others, while showing that 
attitudes can be shifted and that such shifts 
tend to persist, made no attempt to immunize 


1 William W. Biddle, Propaganda and Education. Contribu- 
tions to Education, No. 531. New York: Bureau of Publica- 
tions, Teachers College, Columbia University, 1932. 

2H. R. Anderson, ‘Classroom Evaluation of the Aware- 
ness of Propaganda,”’ Seventh Yearbook of the National Coun- 
cil for the Social Studies, Education Against Propaganda, pp. 
171-182. Lawrence Hall, Cambridge, Massachusetts: The 
National Council for the Social Studies, 1937. 
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Thesis, State University of Iowa, 1931. 
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Influencing Pupils’ Attitudes Toward Peace and War. Doc- 
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pupils against the effects of propaganda de- 
signed to shift their attitudes. 

Although Closson'’ reported a negative 
relationship between information (about 
races) and race prejudice, no studies were 
found which involved knowledge as a factor 
in attitude-shift resulting from a_ specific 
propaganda stimulus. Intelligence has been 
concluded to be unrelated to attitude-shift by 
Bateman and Remmers,'' but Biddle’’ re- 
ported a relationship between intelligence 
and resistance to propaganda. Anderson,’* in 
an informal study, found a tendency for lack 
of understanding of political terms to be asso- 
ciated with the expression of neutral attitudes 
toward such terms. 


Definitions 

Attitude and opinion: These two words are 
common in everyday speech and they are 
often used synonymously. Thurstone and 
Chave"™ use the word, “attitude,” “to denote 
the sum-total of a man’s inclinations and 
feelings, prejudice or bias, preconceived no- 
tions, ideas, fears, threats, and convictions 
about any specific topic.” They’ take the 
word, “opinion,” to “mean a verbal expres- 
sion of attitude.”” When opinions were meas- 
ured in this study, it was assumed correct to 
consider such measurement as the measure- 
ment of attitude. This is the point of view 
taken by Murphy and Likert’® in their recent 
book, Public Opinion and the Individual. 

Attitude variable: This term was used in 
this study to refer to the specific issue con- 
cerning which it was wished to secure a 
measure of an individual's position along a 
linear scale ranging from a negative attitude 
extreme through a neutral or indifferent atti- 
tude to a positive attitude extreme."’ 

Propaganda: “The literature on propaganda 
reveals a wide variety of definitions. For pur- 
poses of this study the writer has defined pro- 
paganda by this statement: “Propaganda is 
the expression of opinion or action by a single 
A Study of the Factor of Informa- 
State University of 


p. 30 


® Eugene E. Closson, 
tion in Race Prejudice. Master's Thesis, 
Iowa, 1930. 

™ Richard M. Bateman and H. H. Remmers, op. cit., 

“= William W. Biddle, of. cét., pp. 59-60. 

“™H. R. Anderson, “Testing Attitude and Understanding,” 
Social Education, 11 (March, 1938), 177-1 

™L. L. Thurstone and E. J. Chave, The Measurement of 
Attitude: A Psychophysical Method and Some Experiments 
with a Scale for Measuring Attitude Toward the Church, pp. 
6-7. Chicago: The University of Chicago Press, 1929. 
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% Gardner Murphy and Rensis Likert, Public Opinion and 
the Individual, p. 3. New York: Harper and Brothers, Pub- 
lishers, 1938 
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person or a group of persons in such a way 
as to influence the opinions or actions of one 
or more other persons.”’ This definition does 
not hold that propaganda is either good or 
bad. It also provides for both intentional and 
non-intentional influences. Strong’® suggests 
a shorter definition which appears to be in 
complete accord with the above definition 
when he says of propaganda: “It is a syn- 
onym for influencing.” The definition, as 
above defined for use in this study, is an ex- 
tension of the following one formulated by 
the staff of the Institute for Propaganda 
Analysis: '® 


As generally understood, propaganda is ex- 
pression of opinion or action by individuals 
or groups deliberately designed to influence 
opinions or actions of other individuals or 
groups with reference to predetermined 
ends. 


Doob* has suggested that a complete defi- 
nition of propaganda would recognize that a 
person is often a propagandist unintentionally 
or unknowingly. No attempt has been made 
in these definitions of propaganda to distin- 
guish between “argumentative persuasion” 
and “propagandistic persuasion” in the sense 
suggested by Biddle.** 

The ability to resist propaganda: In this 
study the ability to resist propaganda was 
understood to be the ability to identify ex- 
pressions of opinion or action which influ- 
ence one to favor or oppose a given opinion 
or action, and to hold one’s judgment or ac- 
tion in abeyance until the desirability of the 
proposed opinion or action from one’s own 
point of view is determined. The ability to 
resist propaganda can never be absolute. We 
can never hope to escape being influenced, to 
a degree at least, by propaganda. 

Knowledge or understanding of attitude 
variable: The score on an objective achieve- 
ment test constructed by the writer was the 
only measure of knowledge or understanding 
of the attitude variable utilized in this study. 

Intelligence: The intelligence quctient of 
each pupil, as reported by the school officials, 
or as secured by the writer where the school 
did not have such records, was used as the 
basis for defining intelligence. Since several 


% Edward K. Strong, Psychological Aspects of meee p 
268. New York: McGraw-Hil! Book Company, 

“Institute for Propiganda Analyss 1... ont 
Monthly Letter, 1 (October, 1937), 1-4. 

* Leonard W. Doob. Propaganda, p. 89. New York: Henry 
Holt and Company, 1935. 


* William W. Biddle, op. cit., p. 62. 
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different commercially available intelligence 
tests were obviously involved, it was neces- 
sary to assume that these tests all yielded 
comparable intelligence quotients. This as- 
sumption is correct to a degree at least. It is 
well known that intelligence quotients based 
on scores from such tests do vary, but they 
are usually quite highly correlated positively. 

Difference between education and propa- 
ganda: While education and propaganda 
overlap in many respects, meaningful differ- 
ences between them can be distinguished. 
Lasswell** has provided a distinction between 
education and propaganda. 


. , the processes by which such tech- 
niques as those of spelling, letter forming, 
piano playing, lathe handling and dialectics 
are transmitted may be called education, 
while those by which value dispositions 
(hatred or respect toward a person, group 
or policy) are organized may be called 
propaganda. The inculcation of traditional 
value attitudes is generally called educa- 
tion, while the term propaganda is reserved 
for the spreading of the subversive, debat- 
able or merely novel attitudes. 





Strong*® also shows how education and 
propaganda overlap, yet how they differ in 
important aspects. 


Psychologists define education as “the 
development of abilities, attitudes, or forms 
of behavior, and the acquisition of knowl- 
edge, as a result of teaching or training.” 
Evidently both education and propaganda 
have at least one common objective, 
namely, the development of attitudes. 
When one takes into account that part of 
education is for the purpose of inculcating 
in youth the attitudes of the majority of 
adults, one must realize that education and 
propaganda overlap considerably. In this 
sense education is propaganda and in be- 
half of the status quo, while propaganda is 
for the new and especially for what is con- 
trary to the status quo. 

But education has two other objectives, 
namely, the development of abilities and 
the acquisition of knowledge, which are 
only minor considerations in propaganda. 
Although some educators are genuine cru- 
saders and make every effort to win con- 
verts to their views, just as do propagand- 


_ =H. D. Lasswell, “Propaganda,” Encyclopedia of the 
Social Sciences, XII (November, 1937), 522. 


™ Edward K. Strong, op. cit., 267-268. 
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ists, the bulk of teaching has to do with the 
inculcation of knowledge—of what is ac- 
cepted by experts in the field of truth— 
and the instructor is satisfied if the major- 
ity of the class pass the final examination. 


Unit of instruction used in this experiment: 
A complete description of this unit of instruc- 
tion is presented in a later portion of this 
article. It should be made clear, however, 
that the time devoted to the teaching of the 
unit, and the conditions under which this 
teaching was done, are essential parts of a 
complete definition of this unit. If, then, the 
results of this experiment show the unit to be 
effective or non-effective in achieving its pur- 
poses, such a conclusion will not necessarily 
be considered valid for all time allotments 
and for all conditions. It is possible, how- 
ever, that the correlation between knowledge 
concerning propaganda devices and resistance 
to propaganda will prove to be negligible. 
Such a fact would make the provision of in- 
creased time allotments for a study of the 
methods of the propagandist a procedure of 
doubtful value. 


Method Used in This Study to Develop 
Pupil Ability to Resist Propaganda 

Too often our approach to the study of 
propaganda has been one which has assumed 
propaganda in any form to be a menace. The 
word, “menace,” implies that all propaganda 
is bad and should be forcibly suppressed. 
Such a point-of-view, however, is, in general, 
rejected by those who have made a study of 
the problem of propaganda. “Propaganda,” 
asserts Lasswell,** ‘as a mere tool is no more 
moral or immoral than a pump handle.” The 
Institute for Propaganda Analysis takes a 
similar stand.** 

It is, of course, possible to set up certain 
legal safeguards. We have certain laws 
against false statements in advertising. We 
have libel laws which give a measure of pro- 
tection against untruth. But there is a limit 
to the extent to which legal standards of truth 
can be used without establishing the type of 
suppression or censorship which is undesirable 
in a democracy. 

Since suppression of propaganda is not 
acceptable in a democratic society, there are 
those who would not make any direct attempt 
to develop the ability to resist propaganda, 


2*H. D. Lasswell, of. cit., p. 525. 
% Institute for Propaganda Analysis, of. cit., p. 2. 
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but would try merely to foster active compe- 
tition of conflicting propagandas as a basis for 
developing critical thinking on the part of the 
public. There is, however, no proof that open 
competition between conflicting propagandas 
will lead the public toward habits of critical 
thinking. Even though a competition of prop- 
agandas might cause more intelligent judg- 
ments on the part of the public, Biddle*® 
believes “that the prizes of propaganda go 
usually to the biggest checkbook, not to the 
greatest truth.” 


A third suggestion is to teach the pupils in 
our schools to detect and analyze propaganda 
in order that they will be more immune to its 
influence.*’ 

Opinions vary as to the most effective 
method of education against propaganda. 
Two general methods have been suggested. 
The first method of teaching critical thinking 
is based on the conviction that there are no 
short-cuts whereby an individual can develop 
resistance to propaganda. According to this 
method, resistance to propaganda would be 
achieved through developing in the individual, 
throughout his school experience, habits of 
approaching conflict situations from an intel- 
lectual, problem-solving point-of-view. Con- 
tent would be emphasized. 


The second method of teaching resistance 
to propaganda would make a direct study of 
the tricks or techniques of propaganda, and 
of how they appeal to our emotions and lead 
us to uncritical acceptance of opinions or sug- 
gested actions. Here the emphasis would be 
on “form” instead of “content.’** A consid- 
erable body of curricular literature is avail- 
able suggesting student activities whereby 
this second method can be put into operation. 
Because of the recent popularization of this 
method in our schools, it was used in this 


study. 


Measurement of Attitude 


Although this study was not directly con- 
cerned with the development of the theories 
or the techniques of attitude measurement, a 
careful study of the development of attitude 
measuring scales was made. 


2% William W. Biddle, op. cit., p. 7. 

27 Clyde R. Miller, How to Detect and Analyze Propaganda, 
p. 7. A Town Hall Pamphlet. 123 West 43rd Street, New 
York City: The Town Hall, Inc., 1939. 

3 Biddle (op. cit., p. 18) has suggested the terms “form” 
and “content” as a useful general way of distinguishing be- 
tween the two methods. 
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Thurstone,*® Remmers,*® and Murphy and 
Likert,** among others, have contributed to 
the development of techniques for the meas- 
urement of attitudes. An attitude scale devel- 
oped by Thurstone and one of his associates 
was used in this study. Thurstone’s methods 
of developing attitude scales are widely ac- 
cepted as valid. For example, La Piere and 
Farnsworth®? say, “Perhaps the most ade- 
quate method so far developed for measuring 
symbolic attitudes is that of L. L. Thurstone 
. . .’ Remmers** and Likert** have built on 
Thurstone’s techniques. Likert*® developed 
techniques which make the preparation of 
such scales easier. Remmers and his stu- 
dents*® have widened the scope of usefulness 
of attitude scales by the development of a 
series of generalized attitude scales. Each of 
Thurstone’s scales is valid only for the meas- 
urement of attitude toward a single attitude 
variable. Since this study did not involve the 
building of an attitude scale, and was con- 
cerned with only a single attitude variable, it 
was thought best to use one of Thurstone’s 
scales. 

La Piere and Farnsworth,*’ in discussing 
attitudinal measurement and its value, apply 
as a check the extent to which such measure- 
ment is actually predictive of human behavior 
in actual life situations. They conclude that 
present techniques are adequate only for the 
measurement of professed or symbolic atti- 
tudes. They”** say, 


Definitely accurate and useful is the 
measurement of attitudes which find their 
expression only through symbolic means, 
such as those toward candidates for public 


**L. L. Thurstone, ‘‘Attitudes Can Be Measured,” American 
Journal of Sociology, XXXIII (January, 1928), 94. 

» H. H. Remmers, “Generalized Attitude Scales—Studies in 
Social Psychological Measurements,”’ Bulletin of Purdue Uni- 
versity. XXXVII (December, 1936), 75. 

** Gardner Murphy and Rensis Likert, op. cit. 

* Richard T. La Piere and Paul R. Farnsworth, Social 
+ > aa p. 237. New York: McGraw-Hill Book Company 

*H. H. Remmers, “Studies in Attitudes—A Contribution 
to Social Psychological* Research Methods,” Studies in Higher 
Education, Bulletin of Purdue University, XXXV_ (December, 
1934), No. 4. Purdue University, Lafayette, Indiana: The 
Director of the Division of Educational Reference. 

H. H. Remmers, “Further Stud‘es in Attitudes, Series IT,” 
Studies in Higher Education, Bulletin of Purdue University. 
XXXVII (December, 1936), No. 4. Purdue University, Lafay- 
ette, Indiana: The Director of the Division of Educational 
Reference. 

* Rensis Likert, “A Technique for the Measurement of 
Attitudes,” Archives of Psychology, XXII, No. 140, June, 
1932. Pp. 55. 
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office. Experience has shown that attitudes 
of this order can be rather accurately 
measured and that the results have high 
predictive value. If taken a relatively short 
time before an election, an adequate test- 
ing of the electorate by “sample ballot” 
methods appears to give an accurate indi- 
cation of what will occur in the election. 


It was believed that overt action, as an 
expression of attitude, should receive subordi- 
nate emphasis in any attempt to make pre- 
dictions concerning the actual expression of 
the attitude measured in connection with this 
study, “e., attitude toward capital punish- 
ment. Rather, an individual’s attitude would 
likely be given expression only through sym- 
bolic means with respect to this issue. 


SUMMARY OF REPRESENTATIVE RESEARCH 
RELATED TO THE PROBLEM 


The Relation of Knowledge to Attitude 

Anderson” tentatively concluded that there 
was a correlation between lack of understand- 
ing of social issues and a tendency to express 
neutral attitudes toward such issues on the 
part of high school pupils. Closson*® found a 
correlation of —.594 between information 
concerning and expressed antipathy toward 
races. 


The Relation of Knowledge and Intelligence 
to Attitude and Attitude Shift 

Bateman and Remmers*! found correlations 
between intelligence and attitude shift to be 
negligible. Bolton**® reported very low corre- 
lations between intelligence and attitude to- 
ward the Negro, and between knowledge con- 
cerning and attitude toward the Negro. 


Studies in the Shifting of Attitude 


Annis** used “planted” editorials in a suc- 
cessful attempt to build up, in university 
students, adverse opinion, and favorable opin- 
ion toward a public representative of a foreign 
country. Bateman and Remmers** found 
that: (1) high school social studies pupils’ 
attitudes toward social issues could be shifted 
either toward a favorable or unfavorable ex- 


Ps -. R. Anderson, “Testing Attitude and Understanding,” 
~ CH. 

“Eugene E. Closson, op. cit. 

“Richard M. Bateman and H. H. Remmers, op. cit. 

“Euri Belle Bolton, “Effect of Knowledge Upon Attitudes 
Towards the N .”’ Journal of Social Psychology, V1 (Feb- 
ruary, 1935), 68-90. 


© Albert David Annis, op. cit. 
“Richard M. Bateman and H. H. Remmers, of. cit. 
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treme, as desired, by the reading of literary 
selections, and (2) high school social studies 
pupils’ attitudes toward social issues tended 
to remain stable when well balanced instruc- 
tion—favorable and unfavorable information 
—concerning them was presented. Bolton*® 
found that freshman and sophomore college 
women did not shift their attitudes toward 
the Negro as a result of being taught facts 
concerning Negro education. This experi- 
menter concluded tentatively that attitude 
toward the Negro was largely conditioned 
“by a cultural pattern of social organization 
which is accepted by individual members in 
the group.’’*® Campbell*? concluded that di- 
rect instruction was superior to incidental in- 
struction in increasing opposition to war. 
Marple** reported that awareness of the 
opinion of one’s own group and awareness of 
the opinion of experts concerning social issues 
caused individuals to shift their opinions 
toward agreement. The frequency order, 
highest to lowest, of opinion changes, was: 
high school seniors, college seniors, and 
adults, but for each maturity level, awareness 
of group opinion was more influential than 
awareness of expert opinion. 
Research in Teaching Resistance to 
Propaganda 

An experiment conducted by Biddle*® was 
designed to determine the effectiveness of a 
series of lessons entitled Manipulating the 
Public in teaching resistance to propaganda 
to high school senior and college freshman 
social studies students. Although Biddle in- 
terpreted the results of his experiment in such 
a way as to support the conclusion that his 
experimental group did develop greater resist- 
ance to propaganda than his control group, 
the validity of the statistical analyses which 
he used may be questioned.”° 


PROCEDURE 


The Design of the Experiment 

The pupils enrolled in the eleventh and 
twelfth grades in twenty pairs of social 
studies classes, with one teacher for each 
pair, in seventeen Iowa high schools partici- 
pated in this experiment. By means of chance, 

*® Euri Belle Bolton, op. cit. 

“ Ibid., p. 88. 

TD. W. Campbell, of. cit. 

“Clare H. Marple, of. cit. 

* William W. Biddle, op. cit. 
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Biddle since the publication of his study. 
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one class in each pair was designated as an 
experimental class and the other as a control 
class. The pupils in each of the twenty ex- 
perimental classes studied, for six days, a 
unit of instruction entitled Public Opinion 
and Propaganda. During this six-day period, 
the pupils in each of the twenty control 
classes carried on their regular class work, 
and were assigned no work on the subject of 
propaganda. 

On the closing day of the fourth full week 
following the beginning of the six-day instruc- 
tional period, all pupils in both the experi- 
mental and control groups in attendance were 
given, in a single sitting of forty-one minutes 
working time, the following exercises: (1) an 
initial measurement of attitude toward capital 
punishment, Attitude Toward Capital Punish- 
ment, Form A, by Ruth C. Peterson and L. L. 
Thurstone;"' (2) an _ achievement _ test, 
Knowledge Concerning Capital Punishment, 
prepared by the writer; (3) a propaganda 
reading selection, Why Capital Punishment is 
Necessary, prepared by the writer and in part 
adapted from a similar selection written by 
Richard M. Bateman;"* and (4) a post- 
measurement of attitude toward capital pun- 


ishment, Attitude Toward Capital Punish- 
ment, Form B, by Ruth C. Peterson and L. L. 
Thurstone.** The interval of approximately 
three weeks between these two phases of the 


experiment was allowed for two reasons. 
First, it was desired to mask the connection 
between these two phases of the experiment, 
and second, the writer felt that this delay 
would be a test of the permanence of the 
learning resulting from the _ instructional 
period. 

On the closing day of the sixth full week 
following the beginning of the six-day in- 
structional peried, all pupils in both the ex- 
perimental and the control groups in attend- 
ance were given a delayed measurement of 
attitude toward capital punishment, Aftitude 
Toward Capital Punishment, Form B, by 
Ruth C. Peterson and L. L. Thurstone. 

Within the current school year, at no time 
prior to or during the six-week period were 
any of the pupils given instruction concerning 
capital punishment as a social issue. 

Within a three-week period following the 
first six-week period just described, a mental 
test, Otis Quick Scoring Mental Ability Test, 

"! Chicago: University of Chicago Press, 1931. 


“ Richard M. Bateman and H. H. Remmers, op. cit. 
™ Chicago: University of Chicago Press, 1931. 
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Gamma Test, Form A, by Arthur S. Otis,** 
was given to approximately all pupils in the 
experiment for whom the local schools did 
not have recent mental test results on file. 


Preparation of Instructional Materials 

The Institute for Propaganda Analysis has 
issued a considerable body of instructional 
materials designed for use in teaching resist- 
ance to propaganda. These materials have been 
judged valuable by teachers who have used it. 
Since there was no provision in this experi- 
ment for extensive tryout of the materials 
prior to use, the suggestions included in the 
Institute’s publication, Propaganda, How to 
Recognize It and Deal With It ( Experimental 
Unit of Study Materials in Propaganda Anal- 
ysis for Use in Junior and Senior High 
Schools ),°* were used freely in the unit pre- 
pared by the writer, Public Opinion and 
Propaganda (A Self-Study Booklet Contain- 
ing a Unit of Study Materials in Propaganda 
Analysis for Use in Senior High School). The 
Institute readily gave permission for use of 
its materials in this unit. Auxiliary aids to 
the self-study booklet were bound in five 
separate booklets: (1) Readings I1—Articles 
Which Present One-Sided Views on Important 
Public Issues; (2) Readings Il—Articles 
Which Lead Us to Look on Both Sides of 
Public Issues, and Which Help Us to See 
How the Propagandist Works; (3) Summary 
1—-Interests to Which Propagandists Appeal; 
(4) Summary I1—The Tricks of the Propa- 
gandist; (5) Examination—Public Opinion 
and Propaganda. In addition to these mate- 
rials, each teacher was provided with a 
mounted exhibit of commercial advertise- 
ments selected to illustrate the points made 
in the instructional materials. 

The unit of instruction consisted of four 
lessons, each of which included one or more 
parts. Each part of a lesson included these 
divisions: Discussion, Student Activities, and 
Notes for Class Discussion. It was deemed 
desirable to keep teacher procedures from 
experimental class to experimental class as 
uniform as possible. The reference materials 
and directions for student activities were 
useful in achieving this uniformity. 

In an attempt to provide for a wide range 
of pupil ability, considerable optional reading 
material was included. An attempt was also 


™ Yonkers-on-Hudson, New York: World Book Company 


1937. 
"132 Morningside Drive, 
Propaganda Analysis, 1938. 


New York: The Institute for 
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made to impress the teachers with the fact 
that the same amount of work was not ex- 
pected of all pupils. 

The examination was prepared to serve as 
an aid in summarizing the learnings of the 
unit, and not primarily as a measurement 
device. The teachers were requested to spend 
a brief time on the seventh day in discussing 
the responses to the items included in the 
examination. 

Coincident with the preparation of the in- 
structional materials for pupil use two aids 
for teacher use were developed: (1) Teacher 
Reference Materials, and (2) Teacher Proce- 
dures for Presenting the Unit of Instruction. 
In addition directions were included for the 
preparation of a Teacher’s Log. 

The unit was tried out under actual class- 
room situations by two teachers prior to its 
final issuance for use in this study. On the 
basis of these preliminary trials, the instruc- 
tional materials were revised. The materials 
entitled, Teacher Procedures for Presenting 
the Unit of Instruction, were prepared on the 
basis of these trials.*° 


The Test and Propaganda Battery 

The reader will remember that, in the first 
phase of the experiment, the emphasis was 
placed upon the immunization of pupils in 
the experimental groups against the effects of 
propaganda. In the second phase of the ex- 
periment, the pupils in both the experimental 
and control groups were confronted with an 
attitude variable which was a new situation 
in that there had been no prior class study 
concerning it within the current semester. As 
previously stated, this attitude variable was 
attitude toward capital punishment. Particu- 
lar care was taken to provide this new situa- 
tion, since the whole object of the instruction 
was to give the pupils in the experimental 
groups the understanding of a set of prin- 
ciples which they could use to resist propa- 
ganda for or against any social issue. If the 
objective of the unit of instruction had been 
to teach a critical attitude toward propaganda 
for or against a specific social issue, propa- 
ganda, as it commonly appears in connection 
with that issue, would have been studied di- 
rectly during the first phase of the experiment. 
It will be recalled that this is just what Biddle 
did in his experiment."* 

™ Copies of ail the instructional materials used in this in- 
vestigation, together with directions for their use, are in- 
cluded in the appendix of the writer’s dissertation on file at 


the College of Education Library, State University of Iowa. 
* William W. Biddle, of. crt. 
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Although the word propaganda did not 
appear in the second phase of the experiment, 
no direct attempt was made to hide the fact 
that only one side of the argument was pre- 
sented in the propaganda reading selection. 
The propaganda was actually hidden to a 
degree, however, in that the pupil was given 
the responsibility for any transfer that was 
made. The selection was introduced by say- 
ing, “‘Below is a selection which gives infor- 
mation which will help you to see why capital 
punishment is considered necessary. Please 
read it carefully because an understanding of 
this article will help you on the rest of the 
test.” 


This attempt to capitalize on the prestige 
which text material usually occupies in class- 
room and test situations is subject to the crit- 
icism that the pupils were put under a “pres- 
sure situation” in which they responded as 
they thought they were expected to, irrespec- 
tive of their recognition of the propaganda 
tricks used in the selection. 


The writer’s answer must be that the in- 
struction aimed to develop immunity to 
propaganda. If the experimentals accepted a 
statement equally as often as did the controls, 
chiefly on the basis of the prestige of the situ- 
ation in which it was offered, they had not 
developed the quality of suspended judgment 
to the degree that it could be distinguished 
from that exhibited by the controls. Since 
both the experimentals and the controls were 
confronted with the same propaganda situa- 
tion, any superiority in immunity possessed 
by the experimentals had a chance to be 
demonstrated by a smaller shift in attitude. 


Even though the propaganda selection 
itself did not admit that there was a tenable 
case for opposing capital punishment, the 
directions to the pupils stated clearly that 
opinions toward the use of capital punish- 
ment do vary. In addition, each teacher told 
his pupils that the results of the tests would 
not count on their school marks. Thus it is 
clear that no “pressure” was applied to the 
pupils as they registered their initial attitudes. 
The propaganda definitely aimed to exert 
“pressure” toward a more favorable response 
to the use of capital punishment on the post- 
test of attitude. If the experimentals were 
relatively more immune to the effects of this 
propaganda, here was the chance for them to 
show their resistance by refusing to alter 
their initial attitudes. 











The names of the parts and the order of 
their presentation in the “test and propa- 
ganda battery” were given earlier in this 
article. The entire battery was given in a 
single sitting of forty-one minutes working 
time. Seven minutes were allowed for each of 
the two attitude tests, and for reading the 
propaganda selection, and twenty minutes 
were allowed for the achievement test. These 
time limits were based upon observations 
made during the preliminary trials. 

It was considered desirable to administer 
the “test and propaganda battery” in a single 
sitting in order to make it impossible for the 
results of the post-test of attitude to be 
affected by discussions of the attitude vari- 
able or of the propaganda selection either 
among the pupils of the same or both groups 
or among pupils and parents, pupils and 
teachers, or pupils and others. In order to 
avoid the effect on attitude which might pos- 
sibly operate as a result of taking the achieve- 
ment test, the initial attitude test was placed 
first in the battery. 


The Delayed-Test of Attitude 

The delayed-test of attitude was used in 
order to reveal the persistency of the attitude- 
shift resulting from the propaganda reading 
selection, and to reveal the differences be- 
tween experimentals and controls after a two- 
week interval had given opportunity for dis- 
cussion of the second phase of the experiment. 
The cooperating teachers were instructed to 
refrain from formal classroom study of the 
“test and propaganda battery” during this 
two-week period. 


RESULTS REVEALED BY A STUDY OF 
STUDENTS’ WORK AND 
TEACHERS’ REPORTS 


An attempt was made to form a judgment 
concerning the degree to which the unit of 
instruction, Public Opinion and Propaganda, 
was effective for use in teaching resistance to 
propaganda, as revealed by its use with the 
experimental group. Three types of materials 
were available as bases for this judgment. 
First, each pupil did all his written work in 
the study booklet which contained all the di- 
rections considered essential for studying the 
unit. Second, each pupil was given an exam- 
ination near the close of the instructional 
period. This examination covered the impor- 
tant points with which it was desired that 
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each pupil would become familiar as a result 
of his study. Third, each teacher submitted a 
log in which he recorded, in anecdotal form, 
certain experiences encountered while teach- 
ing the unit. The following statements con- 
stitute a subjective evaluation of the effective- 
ness of the unit of instruction based upon 
analyses of the three types of materials listed 
above: 


1. In general, the unit of instruction, 
Public Opinion and Propaganda, proved 
suitable for use in eleventh and twelfth 
grade social studies classes with respect 
to its interest appeal, range of difficulty, 
and organization. 

2. The responses to the suggested student 
activities and to the essay questions in 
the unit examination support the con- 
clusion that, after the instructional unit 
had been completed, a large majority of 
the pupils in the experimental group had 
developed the ability to make satisfac- 
tory verbal descriptions of the common 
propaganda devices, and to enumerate 
steps to take in resisting propaganda. 
Each of the items in the unit examina- 
tion possessed some power to discrimi- 
nate between high and low scoring 
pupils. Since the examination was given 
to the experimental group only, it was 
not possible to say what level of under- 
standing of propaganda this group had 
reached as compared with the control 
group. 

3. The teachers felt that more effective re- 
sults would have followed had more time 
been allowed for class discussion. Pos- 
sibly an additional time allotment would 
have improved pupil ability to resist 
propaganda. Later in this article the de- 
gree of correlation between knowledge 
concerning the devices of propaganda, 
as measured by the unit examination 
and the magnitude of attitude-shift in 
response to propaganda, will be reported. 
The size of this correlation should help 
to indicate whether or not more time 
given to a unit of this type would be 
profitable. 

4. Certain teachers reported that their ex- 
perience with the unit had led them to 
doubt whether pupils could detect 
propaganda devices unless they had a 
background of information concerning 
the problem at issue. However, as 
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stated above, the written work of the 
pupils in the experimental group pre- 
sented convincing evidence that a large 

majority of them were able to make 
apparently adequate verbal statements 
concerning common propaganda devices, 
and to enumerate steps to take in re- 
sisting their effects. 


RESULTS OF THE TESTING PHASE OF 
THE EXPERIMENT AND THEIR 
STATISTICAL TREATMENT 


Objective Standard of Effectiveness 


The second standard fo: judging the effec- 
tiveness of the unit of instruction used in this 
experiment was based on objective test re- 
sults. The unit of instruction was to be 
judged effective according to the objective 
standard only if the mean shift in attitude in 
response to propaganda on the part of the 
experimental group was less than that of the 
control group by a statistically significant 
amount. Answers were sought to these ques- 
tions: (1) Was the immediate shift of atti- 
tude on the part of the control group statis- 
tically significant? (2) Was the delayed shift 
of attitude on the part of the control group 
statistically significant? (3) Was the imme- 
diate shift of attitude on the part of the ex- 
perimental group statistically significant? 
(4) Was the delayed shift of attitude on the 
part of the experimental group statistically 
significant? (5) Was the difference between 
the experimental and control groups in 
immediate-attitude-shift statistically signifi- 
cant, and what group did it favor? (6) Was 
the difference between the experimental and 
control groups in delayed-attitude-shift sta- 
tistically significant, and what group did it 
favor? 

Correlation coefficients were used to answer 
the question involved in the second and third 
purposes of the problem. This question may 
be stated as follows: How were both knowl- 
edge and intelligence, respectively, correlated 
with initial attitude, with deviation from neu- 
tral in initial attitude, with immediate shift 
of attitude, and with delayed shift of attitude 
on the part of the control and the experi- 
mental groups considered separately ? 


As this experiment progressed, an addi- 
tional basis for a partial evaluation of the 
effectiveness of the unit of instruction was 
considered. It was thought that it would be 
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interesting to know whether scores on the unit 
examination, made by the pupils in the ex- 
perimental group, were in any way correlated 
with immediate or delayed shift of attitude. 
If a marked degree of negative correlation 
were found, it could be used to support the 
argument that the greater the understanding 
of the content of the unit the greater would 
be the immunity to propaganda. 


Statistical Techniques Used to Determine the 
Reliability of Differences 

Recently attention has been called to the 
fact that “the special sampling error tech- 
niques that have been developed by ‘Student’ 
and R. A. Fisher for use with small samples 
appear, in the light of their potential value in 
educational research, to have been seriously 
neglected by students of education.’”’* This 
statement is based on the fact that, while the 
commonly used “large sample” techniques 
assume that the pupils in the sample shall 
have been selected at random, these tech- 
niques have often been applied erroneously 
to results from experiments with samples 
which may more properly be considered as a 
random selection of school classes, than as a 
random selection of pupils. 


Thus the twenty pairs of classes in this 
experiment collectively constituted a small 
sample. Each pair of classes was split at ran- 
dom, one class being assigned to the experi- 
mental and the other to the control group of 
classes. The experimental group then con- 
sisted of twenty “cases’’ and the control 
group likewise included twenty “cases.” 

Since the experimental and control groups 
were each considered as consisting of twenty 
cases, the pupils in each class were considered 
collectively as a single unit. In each case the 
mean score of the pupils in a class was used 
to describe the performance of the pupils in 
that class. 

The initial and post-attitude-scores were 
available for each pupil both in the control 
and experimental groups. The significance of 
any difference in initial and post-attitude- 
means for either the experimental or control 
groups considered separately was determined 
by a method outlined by Fisher. The differ- 
ence between the initial and post-attitude- 
means for each of the twenty pairs of means 


55 From notes based on class lectures by E. F. Lindquist, 
State University of Iowa. 

5 Ronald A. Fisher, Statistical Methods for a Work- 
ers, pp. 132-133. London: Oliver and Boyd, 
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was found; the mean of these twenty differ- 
ences was then computed. Finally it was de- 
termined whether or not this mean difference 
differed significantly from zero. The same 
procedure was followed to determine the sig- 
nificance of any difference between initial and 
delayed-attitude-means for each group sepa- 
rately. 

The method referred to in the above para- 
graph was applied to determine whether or 
not the mean difference in shift between the 
control and experimental groups was signifi- 
cantly different from zero. 


The test of significance, employed to deter- 
mine whether mean differences were greater 
than those which could be attributed to 
chance, was “Student’s’’ t-test.©° “Student’s” 
“t” is the ratio resulting from dividing the 
difference by which the mean of a small 
sample differs from any given value by its 
estimated standard error. The standard error 
of the mean of such a small sample is esti- 
mated by the formula: 


+o = d* 
SEa=7 NWR) 


in which S.E.,, equals the standard error of 
the mean of a small sample, Sd? equals the 
sum of the squared deviations of each indi- 
vidual score from the mean, and N equals 
the number of cases. The formula for ’’t” 
then becomes: 


M — Mr 


<7 ae 
Vx a) 


Since, for small samples, “t’’ is not dis- 
tributed in the form of a normal curve, but 
is distributed variously, depending upon the 
size of the sample, the meaning of “‘t”’ differs 
for samples of different sizes. A table has 
been prepared by “Student’’®' which shows 
for each number of degrees of freedom from 
1 to 30 what absolute value of ‘“t’’ would be 
exceeded in 1%, 2%, 5%, 10%, 20%, .. . 
90% of all such samples. The degrees of 


freedom are dependent upon the size of the 


sample. 

All computations have been based upon 
unweighted means. This was justified, since 
the classes did not vary greatly in size. 

* Ibid., p. 166. 

* /bid., p. 166. 
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Comparison of Attitude-Shift for Control and 
Experimental Groups, First Testing 
Period 


Table I shows the mean initial and post- 
attitude-scores, and the mean immediate 
shift of attitude scores for both the contro] 
and experimental groups, as revealed by the 
first testing period. It also shows the mean 
difference in shift of attitude scores between 
the control and experimental groups. 


The mean initial and post-attitude-scores 
for the control group were 5.84 and 7.12 
respectively. The mean shift in attitude score 
was thus 1.28 in the direction of a more favor- 
able attitude toward capital punishment. The 
t-value of 19.42 for this difference exceeds the 
value (2.861) required for the 1% level of 
significance for 19 degrees of freedom. 


The mean initial and post-attitude-scores 
for the experimental group were 5.92 and 
7.16, respectively. The mean shift in attitude 
score was thus 1.24 in favor of capital pun- 
ishment. The t-value of 13.00 for this differ- 
ence exceeds the value (2.861) required for 
the 1% level of significance for 19 degrees of 
freedom. 


When the control and experimental groups 
were compared with respect to shift of atti- 
tude, the mean difference in shift to a more 
favorable attitude toward capital punishment 
was .04 in favor of the control group. A 
t-value of .391 would be exceeded by chance 
in seventy per cent of samples of this size, 
and a t-value of .533 would be exceeded in 
sixty per cent of samples of this size. Since 
the t-value of .405 for this difference lies be- 
tween .391 and .533, the difference of .o4 in 
favor of the control group is clearly not 
statistically significant. 


The results summarized in Table I, and 
discussed in the above paragraphs, support 
two conclusions. First, the propaganda read- 
ing selection was effective in causing an imme- 
diate shift in the mean-attitude-scores of both 
the control and the experimental groups to a 
statistically significant degree in the intended 
direction. Second, the mean difference be- 
tween the control and experimental groups in 
such shift was not statistically significant.” 


® In this study a difference was considered statistically siz- 
nificant, if the t-value for the difference was equal to or in 
excess of the value required for the 1% level of significance 
for the number of degrees of freedom involved. 
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TABLE I 


II 


MrEAN-INITIAL-ATTITUDE-SCORES, MEAN-POST-ATTITUDE-SCORES, AND MEAN IMMEDIATE SHIFTS OF 
ATTITUDE FOR CONTROL AND EXPERIMENTAL GROUPS; AND DIFFERENCES IN MEAN- 
IMMEDIATE-ATTITUDE-SHIFTS FOR CONTROL AND EXPERIMENTAL GROUPS 


(First TESTING PERIOD) 














Differ- 
ences: von- 
trol Mean 
Immediate 
Attitude 
Control Group Exverimental Group Shifts 
Mean Mean Wean Mean Minus Ex- 
Ini- Mean Imme- Ini- ¥vean Imme- p?rimental 
Class N tial Post diate WN tial Post Giate Mean Im- 
Atti- Atti- Atti- Atti- Atti- Atti- mediate 
tude tude tuce tude tude tude Attitude 
Seores Scores Shifts Seores Scores Shifts Shifts 
1 29 5.47 6 .87 1.40 27 5.92 7.48 1.56 -.16 
2 26 6.10 7.40 1.30 21 6.20 6.79 59 oft 
3 1@ 5.96 6.82 8&6 26 5.53 T7A3 1.90 +-1.04 
4 31 5.91 TeL3 1.22 32 5.74 7.06 1.32 -.10 
5 72 6Sene 7.15 1.60 31 6.19 Tete 4.535 O07 
6 34 5.88 7.35 1.47 29 6.39 7.435 1.04 43 
7 23 6.10 7.50 1.40 23 5.70 6.70 1.00 40 
8 37 6.46 7.49 1.03 32 6.06 7.40 1.34 #5. 
9 31 5.79 715 1.36 31 4.84 7.09 2.25 - 59 
10 32 «455.85 6.91 1.06 36 5.35 6.1 1.46 - .40 
12 42 5.15 6.49 1.34 33 5.74 6.98 1.24 10 
13 27 5.86 7.57 1.71 28 6.05 7-39 1.34 yf 
14 27 5.91 7.02 1.11 29 6.09 6.85 e179 i 
15 36 «6.44 7e1l -67 30 6.16 6.735 057 ~10 
16 38 5§.31 6.20 89 39 6.36 6.94 58 Jl 
17 33> Ses Teas 1.86 33 5.49 6.81 hole 54 
18 23 5.i1 6.68 1.57 27 5.56 7.09 1.53 04 
19 26 6.22 7.46 1.24 18 6.68 7.69 1.01 ot 5 
20 27 6.20 7.56 1.56 iF 6.39 7.04 1.27 ~O9 
T 604 116.71 142.24 25.63 574 118.35 143.22 24.°7 076 
M 5.64 7.48 Leto 5.92 7.16 1.24 04 
= xe 34.49 ba ne ry i 
Té/n 32.84 30.93 03 
rae 1.65 3.46 3.70 
SE 0659 20954 .0987 
m 
t 19.42 13.0¢ 405 
Per Cent 
eyel 1.00 1.990. 70.00 
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Comparison of Attitude-Shift for Control and 
Experimental Groups, Second Testing 
Period 

Table II shows the mean initial and 
delayed-attitude-scores, and the mean shift 
of attitude scores for both the control and 
experimental groups. The delayed-attitude- 
scores resulted from the administration of the 
delayed test of attitude in the second testing 
period. It will be recalled that this second 
testing period followed the first after a two- 
week interval. The reduced number of pupils 
in this table for both groups, as compared 
with the number in Table I, was due to ab- 


sences at the time of the second testing 
period. 
The mean initial and delayed-attitude- 


scores for the control group were 5.85 and 
6.79, respectively. The mean shift in attitude 
score was thus .94 in favor of capital punish- 
ment. The t-value of 13.93 for this difference 
exceeds the value (2.861) required for the 
1% level of significance for 19 degrees of 
freedom. 

The mean initial and delayed-attitude- 
scores for the experimental group were 5.05 
and 6.88, respectively. The mean shift in 
attitude score was thus .93 in favor of capital 
punishment. The t-value of 14.37 for this 
difference exceeds the value (2.861) required 
for the 1°@ level of significance for 19 degrees 
of freedom. 


When the control and experimental groups 
were compared with respect to shift of atti- 
tude after this lapse of two weeks, the mean 
difference in shift to a more favorable attitude 
toward capital punishment was .or in favor 
of the control group. Since a t-value of .127 
would be exceeded by chance in ninety per 
cent of samples of this size, the t-value of 
.107 for this difference clearly shows that the 
difference of .o1 in favor of the control group 
is not statistically significant. 

The results summarized in Table II, and 
discussed in the above paragraphs support 
two conclusions. First, the propaganda read- 
ing selection was effective in causing a shift 
in the mean-attitude-score of both the experi- 
mental and the control groups to a statis- 
tically significant degree in the intended direc- 
tion when the end attitude was measured 
after a lapse of two weeks from the time the 
selection was read. Second, the mean differ- 
ence between the control and experimental 
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groups in such shift was ‘not statistically 
significant. 


The Relation of Knowledge and Intelligenc: 
to Attitude and Attitude Shift 


For the experimental and control groups 
separately the within class correlations were 
determined between: (1) achievement and 
initial attitude,°* (2) achievement and 
immediate-attitude-shift,°* (3) achievement 
and delayed-attitude-shift,°° (4) intelligence 
and initial attitude,°* (5) intelligence and 
immediate-attitude-shift, (6) intelligence and 
delayed-attitude shift, (7) achievement and 
deviation from a neutral score on initial atti- 
tude test, and (8) intelligence and deviation 
from a neutral score on initial attitude test.’ 
In order to eliminate the effect of schoo! 
differences and determine the correlation 
within classes, these correlations were com 
puted by the method of analysis of co 
variance, described by Fisher."* 

Table III shows these correlations. The 
table shows the number of cases upon which 
each correlation was based. For example, the 
correlation between intelligence and inime 
diate-attitude-shift for the 601 pupils in the 
control group for which such measures were 
available was .og6. A study of this table re- 
veals that all the correlations were negligible 
in magnitude. 


Relationship Between Scores on Unit Exam- 
ination and Attitude Shift on the Part o! 
Pupils of the Experimental Group 


The correlation between the scores on the 
unit examination and immediate-attitude- 
shift, and the delayed-attitude-shift, were 
computed on the basis of all pupils in the 
experimental group for whom the necessary 
scores were available. The analysis of co- 
variance as described by Fisher®’ was em- 
ployed to determine each of these correla- 


® Achievement was measured in terms of scores on the 
capital punishment achievement test. 

* Immediate-attitude-shift refers to the shift of attitude 
resulting from the reading of the propaganda selection. The 
immediate shift for each pupil was found by subtracting his 
initial attitude score from his ——- score. Post atti- 
tude was measured by Form of the capital punishment 
attitude test. 

* Delayed-attitude-shift for each pupil was found by sub- 
tracting his initial attitude score from his delayed attitude 
score. Delayed attitude was measured by re-administering 
Form B of the capital punishment attitude test two weeks 
following the first testing period. 

* The intelligence quotient for each pupil was used. 

* The neutral score on both forms of the attitude test was 
5.50. The signs of the deviations were ignored. 

® Ronald A. Fisher, op. cit., pp. 275-290. 


® Ibid. 
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TABLE II 


MEAN-INITIAL-ATTITUDE SCORES, MEAN-DELAYED-ATTITUDE SCORES AND MEAN DELAYED SHIFTS 
or ATTITUDE FOR CONTROL AND EXPERIMENTAL GROUPS; AND DIFFERENCES IN MEAN- 
DELAYED-ATTITUDE SHIFTS FOR CONTROL AND EXPERIMENTAL GROUPS 


(SECOND TESTING PERIOD) 





Control Group 


Experimental Group 


Differencess 
Control 
Mean De- 











Mean Mean Nean Mean Mean Mean layed Atti- 
Class N Ini- De- De- N Ini- De- De- tude Shifts 
tial layed layed tial layed layed Minus Experi- 
Atti- Atti- Atti- Atti- Atti- Atti- mentel Mean 
tude tude tude tude tude tude Delayed Atti- 
Scores Scores Shifts Scores Scores Shifts tude Shifts 
1 29 5.47 6.09 62 26 5.87 6.88 be > | - 239 
2 25 6.14 TeaT 1.23 20 6.24 7.11 87 236 
3 iF 5.88 6.56 68 26 5.53 6.79 1.26 - 58 
4 29 5.93 6.74 81 27 6.04 6.90 226 - .05 
5 29 5.65 6.77 1.12 29 6.24 7.°8 84 28 
6 be 5.87 6 .83 96 29° 6.39 7.67 1.28 -.32 
7 el 5.91 TOL 1.10 22 5.67 6.19 52 58 
8 36 6.52 Teo ota 31 6.08 7s 93 -.ce 
9 30 5.72 6.63 91 30 4.91 6.40 1.49 - 58 
10 32 5.85 6.88 1.03 34 5.36 6.45 1.09 - .06 
11 31 6.09 7.14 1.05 28 5.90 6.68 78 Py 
12 39 5.13 5.66 053 33 5.74 6.79 1.05 - 52 
13 26 5.97 6.94 97 26 6.14 7.34 20 -.235 
14 26 5.85 6.78 93 27 6.13 6.52 239 054 
1. 2e 6.63 6.91 28 29 6.15 6.68 253 = .25 
16 35 5.41 6.09 68 36 6.44 7.13 69 -.Ol1 
tT zx 5.42 6.99 1.57 29 5.55 6.19 264 93 
18 23 5.22 6.53 1.31 25 5.53 6.75 Lae O09 
19 26 6.22 7.46 1.24 16 6.66 7.49 83 41 
20 27 6.20 Tere hake 17 6.39 749 1.10 eOl 
T 579 117.08 135.92 18.84 540 118.96137.54 18.58 226 
Me 5.85 6.79 294 5.95 6.88 95 Ol 
L x2 19.48 18.85 3.32 
Té/n 17.75 17.26 00 
Lae 1.75 1.59 3.32 
Seng 20675 00647 = 0935 
t 13.93 14.37 107 
Per Cent 
level 1.00 1.00 90.00 


tions. The correlations were .042 and .114 
respectively. These are within class corre- 
lations. 

These negligible correlations support the 
conclusion that possession of information 
concerning the tricks of propaganda at the 
level measured by the unit examination does 


not result in increased resistance to propa- 
ganda. 


Reliabilities of the Tests 

The reliabilities of the achievement test 
and unit examination were determined by the 
chance-halves method. To determine the reli- 
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TABLE III 


CORRELATIONS OF ACHIEVEMENT AND INTELLIGENCE WITH ATTITUDE AND ATTITUDE SHIFT 


Control Group 


Experimental Group 


Intelligence Achievement Intelligence Achievement 

N* ¥é N* rz N* rz N* rz 

Initial Attitude 604 —.112 608 —.177 575 —.046 576 —.056 

Immediate Attitude Shift. 601 .096 606 O71 573 .001 574 082 

Delayed Attitude Shift 573 161 579 .067 538 .041 540 091 
Deviation of Initial Atti- 

tude from Neutral 604 107 608 013 575 .090 576 .O77 


oN number of cases. 
2 r-— within class correlations. 


abilities of the attitude tests, a method sug- 
gested by Thurstone and Chave*’ was used. 
These reliabilities were: achievement test, 
.67; attitude test, Form A, .77; attitude test, 
Form B, .84; and unit examination, .56. 
Kelley”: shows that reliabilities such as these 
are sufficiently high for use in group 
measurement. 


Analyses of Tests 
The tests constructed by the writer—the 
achievement test and the unit examination— 
were carefully analyzed. They appeared to be 
acceptable in the light of generally accepted 
criteria.”* 
CONCLUSIONS 


Limitations 

Throughout this study careful considera- 
tion was given to certain limitations which 
accompany research of this type. The follow- 
ing statements present a summary of these 
limitations: 


1. Propaganda in printed, textual form 
only was used in this study. Possibly 
the use of other types of propaganda 
would yield different results. 

2. Only one social issue was used as a basis 
for this study. Although there appears 
to be little reason to believe that the use 
of a different social issue would have 
yielded different results, it must be ad- 
mitted that experimentation would be 
necessary to establish the validity of this 
assumption. 

3. Possibly the pupils responded to the 
attitude tests in a manner which they 
believed would be consistent with their 
teachers’ wishes. However, both the ex- 


™ L. L. Thurstone and E. J. Chave, op. cit., pp. 65-66. 

™ Truman L. Kelley, /mterpretation of Educational Measure- 
ments, pp. 210-211. New York: World Book Company, 1927. 
Pe bi original dissertation includes a complete item analysis 
of each test 


perimental and control groups worked 
under the same testing conditions, and 
any superior resistance possessed by the 
experimental group had an adequate 
opportunity to exhibit itself. 

4. This experiment involved measurement 
and hence its findings are limited by the 
validity and reliability of each of the 
measuring devices used. The tests used 
as a basis for measurement in this study 
had satisfactory reliabilities. The atti- 
tude tests used in this study have been 
generally accepted as valid. In making 
the unit examination and the achieve- 
ment test, attempts were made to sample 
each important division of content com- 
prising the bodies of knowledge implied 
by the titles of the tests. These tests 
can be considered valid to the extent 
that such attempts may be judged suc- 
cessful. 


Conclusions 

These conclusions must be considered only 
in the light of the limitations presented in 
the preceding section. However, the results 
of this experiment were highly consistent, and 
they strongly support the following conclu- 
sions: 


1. The unit of instruction, Public Opinion 
and Propaganda, was suitable for use in 
eleventh and twelfth grade social studies 
classes with respect to its interest ap- 
peal, range of difficulty, and organ- 
ization. 

2. Even though there was strong evidence 
to indicate that the pupils in the experi- 
mental group did develop an increased 
awareness of the methods of the propa- 
gandist, the objective standard of effec- 
tiveness, used in this experiment, showed 
that the study of the unit of instruction, 
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Public Opinion and Propaganda, did not 
prove to be effective in developing re- 
sistance to propaganda on the part of 
these pupils. 

The negligible correlations between 
measures of knowledge concerning pro- 
paganda devices and measures of imme- 
diate and delayed shift of attitude in 
response to propaganda as found in this 
study, strongly suggest that attempts 
to teach resistance to propaganda with 
respect to social issues by emphasis only 
on the “form” in which propaganda 
commonly appears will be unlikely to 
succeed. It seems unlikely, therefore, 
that even a longer time allotment for 
the unit of instruction used in this ex- 
periment would have developed greater 
resistance to propaganda on the part of 
the experimental group than on the part 
of the control group. 


_ To the extent that pupils’ reactions to- 
ward propaganda with respect to capital 
punishment, as found in this study, can 
be accepted as a basis for generalization, 
attitudes of senior high school social 
studies pupils toward social issues can 
be shifted in a pre-determined direction 
by means of propaganda in the form of 
a literary selection even when careful 
study of methods of resisting propa- 
ganda has been completed by these 
pupils less than one month prior to their 
being subjected to such propaganda. In 
the absence of additional propaganda 
with the same or a counter emphasis, 
such shift of attitude tends to be rela- 
tively permanent as shown by the fact 
that, in this study, the shift was still 
statistically significant after a delay of 
two weeks. This tendency for a new 
attitude to have permanence has special 
significance when it is considered in the 
light of this study. That the experimen- 
tal group did not differ in attitude from 
the control group after this two-week 
delay, is additional evidence that a 
study of the “tricks” of the propagandist 
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The low correlations between knowledge 
concerning a social issue and attitude 
toward that issue, obtained in this study, 
constitute strong evidence that measures 
of knowledge concerning a social issue 
possessed by high school pupils are in- 
adequate for use in predicting which ex- 
treme of a social issue such pupils will 
tend to favor. The same conclusion 
holds for measures of intelligence, so far 
as this study is concerned, since the ob- 
tained correlations between intelligence 
and attitude were also negligible in 
magnitude for both groups. 


. The low correlations between knowledge 


concerning a social issue and deviation 
from neutral in attitude toward that 
issue, obtained in this study, provide 
strong evidence that measures of knowl- 
edge concerning a social issue possessed 
by high school pupils are inadequate for 
predicting the extent to which such 
pupils will have definite convictions to- 
ward that social issue. The same con- 
clusion holds for measures of intelli- 
gence, so far as this study is concerned, 
since the obtained correlations between 
intelligence and deviation from a neu- 
tral attitude were also negligible in 
magnitude for both groups. 


. The low correlations between knowledge 


concerning a social issue and immediate 
or delayed-shift of attitude toward that 
issue in response to propaganda on the 
part of high school pupils, as revealed 
in this study, present strong evidence 
that measures of the knowledge concern- 
ing a social issue possessed by high 
school pupils are inadequate for predict- 
ing ability to resist propaganda. The 
same conclusion holds for measures of 
intelligence, so far as this study is con- 
cerned, since the obtained correlations 
between intelligence and attitude-shift 
were also negligible in magnitude for 
both groups. 


Educational Implications 


The following statements grow out of the 
findings of this study: 


alone will not develop effective resist- 
ance to propaganda on the part of senior 
high school social studies pupils. 

The fact that attitudes can be shifted 
and that such shifts tend to be perma- 
nent substantiates what previous inves- 
tigators have concluded. 


1. The findings of this study lend emphasis 
to the fact that curricular materials or- 
ganized for a specific teaching purpose 
should be tried out and revised in the 








. Since attitudes 
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light of the outcomes resulting from 
their use. 

represent educational 
outcomes which very possibly may not 
be predicted by the usual achievement 
test results, teachers should include tests 
of attitude as a part of their program 
for the measurement of outcomes of 
instruction. 


. Even though a study of the “tricks” of 


propaganda does not alone appear to be 
an effective way of developing resistance 
to propaganda, the strong motivating 
character of such study, as demonstrated 
in this experiment, should be utilized in 
other units designed for use in teaching 
resistance to propaganda. Additional ex- 
perimentation should be carried on to 
determine the effectiveness of units of 
instruction which provide for the devel- 
opment of resistance to propaganda 
through attention to both the “form” 
and “content” in which propaganda 
commonly appears. 


. Since the evidence presented by this 


study indicates that neither achievement 
nor intelligence, as commonly measured, 
is a dependable index of ability to resist 
propaganda, the writer suggests that ex- 
perimentation be carried on to evaluate 
the effectiveness of units of instruction 
designed for use in developing the spirit 
of criticism with respect to social issues 
contemporaneously with the original 
study of such issues. Dewey*’ suggests 
the importance of this type of experi- 
mentation when he charges that school- 
ing ‘consists in a systematic, almost de- 
liberate, avoidance of the spirit of crit- 
icism in dealing with history, politics, 
and economics.” A unit of instruction 
involved in such an experiment would 
strictly avoid “indiscriminating idealiza- 
tion” of our institutions, but would deal 
with the “realities of social struggles and 
problems,” and would provide for atten- 
tion to “intelligent acquaintance with 
facts” and for “impartially conducted 
discussions.”” In the words of Dewey" 
such a unit or series of units would be 
designed to “cultivate the habit of sus- 
pended judgment, of skepticism, of de- 
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sire for evidence, of appeal to observa- 
tion rather than sentiment, discussion 
rather than bias, inquiry rather than 
conventional idealizations.” Of course. 
this was the purpose of the unit used in 
the study reported in this dissertation. 
However, the failure of the type of ap- 
proach used in this study does not mean 
that it is impossible to teach critical 
thinking. Rather it suggests that new 
attacks be made on the problem. 


. While the possession of knowledge and 


intelligence is no doubt necessary in 
order to do critical thinking, the results 
of this experiment strongly suggest that 
an individual may, according to com- 
monly obtained measures, possess both 
these traits to a high degree and yet be 
highly susceptible to propaganda influ- 
ences. Possibly critical thinking can be 
developed best when pupils are taught 
in such a manner, throughout their 
school experiences, that they must con- 
stantly use information in_ problem- 
solving situations and in such a manner 
that they are constantly forced to make 
tentative conclusions as a result. In 
other words, it is just possible that the 
way to teach critical thinking is to give 
pupils long-term practice in it. Horn” 
says: 

The emphasis upon active learning 
under the guidance of purpose in con- 
trast with pedantic instruction under 
teacher domination has long been 
prominent in the writings of educa- 
tional reformers and notably, in re- 
cent times, in the writings of John 
Dewey. It is in harmony with the 
theory and evidence on learning and is 
revolutionary, in so far as it is under- 
stood and applied, in transforming in- 
struction in the social studies. Teach- 
ers become educators in the vital and 
generic sense of the term; pupils share 
in setting up problems and assume 4 
large part of the responsibility for 
their solution. Ideas become growing 
rather than static things. The whole 
atmosphere of the school is changed. 





Sow Dewey, “Education as Politics,” New Republic, 
XXXII (October 4, 1922), 139-141. 


™ loid. 


™ Ernest Horn, Methods of Instruction in the Social Studies, 
p. 124. New York: Charles Scribner’s Sons, 1937. 
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Dewey™® levels severe criticism against 
current schooling” when he says: 


It not only does little to make discrim- 
inating intelligence a safeguard against 
surrender to the invasion of bunk, espe- 
cially in its most dangerous forms—social 
and political bunk—but it does much to 
favor susceptibility to a welcoming recep- 
tion of it . . . The specialist in any one of 
™ John Dewey, of. cit., pp. 139-141. 
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the traditional lines is as likely to fall for 
social bunk even in its extreme forms of 
economic and nationalistic propaganda as 
the unschooled person; . . 


Teachers, and social studies teachers in 
particular, must not be content to teach in- 
formation and skills alone. The results of this 
study suggest that such teaching is but one 
phase of their task. 














TEACHING GEOMETRY TO CULTIVATE REFLECTIVE 
THINKING: AN EXPERIMENTAL STUDY WITH 
1239 HIGH SCHOOL PUPILS 


GILBERT ULMER 
University of Kansas 


The studies on the transfer of training 
which have dealt with transfer of reasoning 
ability from subject matter fields to more 
general fields may be divided into two groups: 


(1) Those in which little or no transfer of 
this kind was demonstrated, and 


(2) Those in which a large amount of 
transfer was secured. 


In the first group of investigations the 
teaching procedure was not recognized as an 
important factor, whereas in the second group 
definite provisions were made in the teaching 
procedure to secure transfer. As a result of 
the studies in the second group the position 
is frequently taken that if we desire transfer 
we must teach for it. It should be observed, 
however, that in practically all of the studies 
in which conscious attempts have been made 
to produce transfer of reasoning ability the 
experimental groups have been taught by the 
persons who developed the teaching methods. 
The experimenters worked under highly 
favorable conditions and, moreover, they un- 
doubtedly represent a superior type of 
teacher. The question may well be raised as 
to whether teachers can be expected to secure 
this kind of transfer under ordinary classroom 
conditions even if they make a conscientious 
attempt to teach for it. 


The purpose of the present study was to 
evaluate the results obtained by a number of 
high school geometry teachers in different 
localities who used a method of teaching 
geometry in which emphasis was placed upon 
the cultivation of reflective thinking. 


PROCEDURE 


The experiment was conducted in seven 
high schools during the first semester of the 
school year, 1938-1939. An_ experimental 
group and two control groups were used. The 
experimental group consisted of the pupils in 
a number of geometry classes in which a defi- 


nite attempt was made to improve the quality 
of pupils’ thinking by making use of the 
opportunity offered in geometry to study prin- 
ciples of reflective thinking. Many applica- 
tions of reflective thinking were considered in 
both geometric and non-geometric situations. 
One control group consisted of the pupils in 
a number of geometry classes in which no par- 
ticular emphasis was placed upon methods of 
thinking or upon the application of the kind 
of thinking done in geometry to non-geometric 
situations. The second control group con- 
sisted of pupils who were not enrolled in 
geometry and who had never previously 
studied geometry. Hereafter, these groups 
will be referred to as the Experimental Group, 
the Geometry Control Group, and the Non- 
Geometry Control Group. Complete records 
were obtained for 1239 pupils. 


The classes in the Experimental Group 
were taught by ten teachers in six different 
schools. The classes in the Geometry Control 
Group were taught by eight teachers in six 
different schools. The distribution of the 
pupils in the Experimental Group and the 
Geometry Control Group at the beginning of 
the year is shown in Tables I and II. 


TABLE I 


DISTRIBUTION OF PUPILS IN EXPERIMENTAL 
GROUP AT THE BEGINNING OF THE YEAR 


Number of Number of 


Teacher School Classes Pupils 
OE I 2 67 
FS II 2 57 
 S ene III 1 35 
 * eee III 2 51 
2 ERE ee III 1 31 
era ase IV 4 116 
OS eae Vv 2 55 
 ” {OSE VI 2 63 
7. eee VI 4 128 
eee rae VI 1 35 
ae 21 638 
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TABLE II 


[)ISTRIBUTION OF PUPILS IN GEOMETRY CONTROL 
GROUP AT THE BEGINNING OF THE YEAR 


Number of Number of 





Teacher School Classes Pupils 
as Gappearen ren os : I 1 29 
C2 = II 1 33 
i:..g erect 3 85 
C4 2 anicnuasis 2 56 
C-5 i ee 1 30 
"a pee cenne weal IV 3 80 
C-7 a, | 3 88 
ee Aa VII 1 15 
| Ee are ee 15 416 


Each Roman numeral refers to the same 
school wherever it appears in this presenta- 
tion, and each teacher designation always re- 
fers to the same teacher. In Schools IV, VI, 
and VII, geometry is an eleventh year course; 
in the other schools it is a tenth year course. 

It was desired that the school programs of 
pupils in the Non-Geometry Control Group 
should be varied from the standpoint of sub- 
ject matter and that the only important differ- 
ence between the programs of pupils in this 
group and the programs of pupils in the other 
two groups should be with respect to the 
geometry course. However, some difficulty 
was encountered at this point. The mere fact 
that a group of pupils are at the grade level 
where geometry is offered and are not enrolled 
in geometry may_be indieative of other selec- 
tive factors, For example, some of the pupils 
may have avoided geometry because it is gen- 
erally regarded as a difficult course; others 
may be pursuing a commercial rather than an 
academic program. Whatever objection there 
may be on this score was partially avoided by 
selecting non-geometry controls for a large 
number of the tenth grade experimental sub- 
jects from the tenth grade in a school (School 
VI) in which geometry is an eleventh grade 
course. These pupils were taken from regular 
tenth grade algebra classes and probably rep- 
resent as much variety in subject matter as 
does the entire tenth grade class in that 
school. In another school where geometry is 
an eleventh grade course (School IV), all 
eleventh grade pupils not enrolled in geometry 
and who had never studied geometry were 
included in the Non-Geometry Control Group. 
\lso, some tenth grade pupils were included 
irom two schools (Schools I and III) in 
which geometry is a tenth grade course. The 
importance of any selective factors inherent in 
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the method of forming this group is minimized 
by the fact that later the three groups were 
equated on three bases. The distribution of 
pupils in this group is shown in Table III. 


TABLE III 


DISTRIBUTION OF PUPILS IN NON-GEOMETRY 
CONTROL GROUP AT THE BEGINNING 
OF THE YEAR 
Number of 


School Pupils 
See ee See 26 
ME a eect adl a pompano oeEEiS - 
i —— 
_) 2S _. 208 
MD sicictwesies woken aise, ae 


A reasoning test was given to all pupils in 
the three groups at the beginning of the 
semester and another was given at the end. 
These tests are discussed in the section on 
evaluation. After the first test had been given, 
three equated groups were formed by match- 
ing pupils with approximately the same chron- 
ological age, I.Q., and initial test score. The 
intelligence quotients used were obtained from 
school records and had been secured through 
the use of different intelligence tests. This is 
not regarded as a serious limitation, since this 
measure was only one of three used in equat- 
ing the groups. 

Table IV shows the mean and standard 
deviation of each of the three variables used 
as bases for equating the groups. Those pupils 
who, because of transfers, absences, or with- 
drawals, did not take the final test have been 
eliminated. Also, a very small number of 
pupils over seventeen years of age were ex- 
cluded from these equivalent groups. This 
was done for three reasons: (1) it would have 


TABLE IV 
MEAN AND STANDARD DEVIATION OF AGES, 


INTELLIGENCE QUOTIENTS, AND INITIAL TEST 
ScorRES FOR PUPILS IN THE EQUATED GROUPS 


Non- 
Experi- Geometry Geometry 
mental Control Control 
Group Group Group 
Number of Pupils 330 330 330 
Mean Age -_-__-- 15-9 15-9 15-10 
iy ee mo. 8 mo. 8 mo. 
Mean I.Q. _------ 111 112 111 
iO! (a 2.1 13.9 12.3 
Mean Initial Test 
- (eee 0.2 30.9 29.0 
S.D. Initial Test 
eee 12.9 13.0 11.? 
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been difficult to match the records of these 
pupils, (2) intelligence tests for individuals 
in the upper age levels possess less reliability, 
and (3) it was felt that pupils of this age do 
not represent typical high school geometry 
pupils. 

It has already been indicated that in most 
of the studies in which deliberate attempts 
have been made to produce transfer of rea- 
soning ability from subject matter fields to 
more general fields, the experimental classes 
have been taught by the teachers who devel- 
oped the experimental methods. In_ the 
present study it was desired that the experi- 
mental classes be taught by a number of 
teachers in different localities and with vary- 
ing degrees of familiarity with the experi- 
mental method. Classes taught by the writer 
were not included in the experiment. 

The teachers of the experimental classes 
were selected because they were known to be 
interested in relating geometry to clear think- 
ing. Principals and department heads were 
consulted in an effort to select highly capable 
teachers for geometry control classes. Three 
of the ten teachers of experimental classes 
also taught classes in the Geometry Control 
Group and are included among the eight 
teachers of that group. Thus, Teachers E-1, 
E-—2, and E-3 are the same as Teachers C—r, 
C2, and C~4, respectively. 

Near the end of the school year preceding 
the semester in which the experiment was 
conducted, four teachers in four schools who 
were planning to handle classes in the Experi- 
mental Group gave the initial reasoning test 
to some of their regular geometry classes. 
This was done for the purpose of determining 
whether or not these teachers had been 
achieving superior results in cultivating crit- 
ical thinking, even before they began to mod- 
ify their teaching method in connection with 
this experimental study. It later happened 
that one of the four teachers did not take 
part in the experiment because her teaching 
program the next fall did not include any 
geometry classes. This particular teacher is 
the head of the mathematics department at 
School VI, and the scores of the pupils in her 
classes were retained. The results are shown 
in Table V. 

The scores in Table V, made at the end of 
the year’s course, are only slightly higher than 
the scores made by pupils in the experiment 
at the beginning of the geometry course. This 
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TABLE V 


ScoRES ON INITIAL REASONING TEST GIVEN IN 
JUNE, 1938 (PRECEDING THE EXPERIMENT) 


Number Number 


0 of Mean 
Teacher School Classes Pupils Score 
SOON - detisineacciaia II 1 33 31.7 
* ae Ill 2 51 29.0 
 ’ eee Vv 2 44 27.6 
Dept. Head_ VI 3 85 33.4 


result is consistent with the results reported 
later in this article which indicate that the 
gain due to maturation over the period of a 
single semester was so slight as to be 
unmeasurable. 


THE EXPERIMENTAL METHOD 


The essential characteristic of the experi- 
mental method was the conscious attempt to 
make pupils more critical in all of their think- 
ing by the study of thought patterns both in 
geometry and outside of geometry. Although 
all ten of the teachers of the Experimental 
Group had the same objective, there were 
differences among these teachers in the details 
of their teaching procedures and techniques. 
These differences were in part the result of 
variations in the kind and amount of prepa- 
ration which the teachers made for the 
present study and in the amount of experience 
which they had previously had with the ex- 
perimental method. There were, however, 
many classroom procedures and techniques 
which all of the teachers of this group used 
in common. For the most part these were 
contained in a manual which was prepared 
by the writer and furnished to the teachers of 
the Experimental Group. 

The manual was the outgrowth of four 
years of experimentation by the writer with 
classroom procedures and techniques in an 
effort to develop a teaching method which 
could be used to cultivate reflective thinking. 
One method thus developed was reported’ in 
1937. That method included many applica- 
tions of principles of reflective thinking to 
non-geometric material. However, the method 
necessitated teaching the course without the 
use of a textbook, and for the present study 
a manual was desired in which most of the 
essential features of that method would be 
adapted for use with a textbook. 

1G. Ulmer, “Teaching Geometry for the Purpose of Deve! 


oping Ability to Do Logical Thinking.”’ Mathematics T+ 
XXX (December, 1937), 355-57 
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The manual was prepared in three sections, 
each of which was sent to the teachers in the 
course of the semester. The writer also held 
conferences with the teachers, at which times 
there were discussions concerning teaching 
devices. Additional suggestions were included 
in letters sent to the teachers from time to 


time. 


Tue Metuop Usep WITH THE GEOMETRY 
CONTROL GROUP 


The method used with the geometry con- 
trol classes might be termed an up-to-date 
form of the ordinary or traditional method of 
teaching geometry. Details of the method dif- 
fered from one teacher to another in this 
control group but in general it may be said 
that the emphasis was placed upon important 
theorems and their proofs and also upon the 
solution of many original exercises. The 
memorizing of proofs was discouraged and 
pupils were urged to build their own 
arguments. 

It should not be inferred that reflective 
thinking was not an important consideration 
in the method used in the geometry control 
classes. The pupils had many opportunities 
to discriminate between tested conclusions 
and mere assertions, guesses, and opinions. 
But this kind of thinking entered only in con- 
nection with the derivation of geometric re- 
lationships. Thought processes and methods 
of thinking were not emphasized. It is at this 
point that the essential difference between the 
methods in the two groups is to be found. In 
the control classes, it was the geometric facts 
and relationships that were regarded as im- 
portant. The same relationships were studied 
in the experimental classes, but here the 
methods of thinking employed in obtaining 
those relationships and the application of 
those methods of thinking in many different 
fields received the major emphasis. 


EVALUATION 


The objective of the teaching procedures 
and techniques used with the Experimental 
Group was to affect the pupils in such a way 
that they would do more effective thinking 
throughout the rest of their lives. The meas- 
urement of the degree to which an objective 
of this type has been achieved is far more 
difficult than is the measurement of memo- 
rized facts or manipulative skills. 
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For the purpose of this study it'was desired 
that a test be used which would measure abil- 
ity to do reflective thinking. Syllogism tests 
appeared to be unsatisfactory because of the 
simplicity of the thought patterns involved in 
the items of these tests. Reasoning tests such 
as those of Burt and Dale seemed to be better 
suited for the purpose of this study, but they 
involved the making of logical inferences in 
situations in which prejudices and emotions 
would be unimportant factors. The best avail- 
able test appeared to be one constructed by a 
group of workers at Ohio State University in 
connection with the evaluation program of 
the Eight-Year Study of the Progressive Edu- 
cation Association. The tests used were in 
part adapted from and in part patterned after 
this last mentioned test. 

In each item of these tests a discussion of 
some issue is presented and the pupil is asked 
to check one of several possible conclusions 
as being the one which is most consistent with 
the facts stated in the discussion. The pupil 
is then required to select from a list of state- 
ments those statements which can best be 
used to support the conclusion he has previ- 
ously checked. Many of the issues concern 
controversial matters. 

It should be observed that a number of 
factors in reflective thinking which were em- 
phasized in the experimental classes are not 
covered by the items of these tests. Although 
a classification of the principles of reflective 
thinking which were treated in the experi- 
mental method is difficult to make because of 
much overlapping, it may be said that 
emphasis was placed upon the following: 


1. If-then or postulational thinking 

2. The importance of defining key words 
and phrases 

. Reasoning by generalization 

. Reasoning by analogy 

. Detecting implicit assumptions 

. Inverses and converses 

. Indirect proof 

Name calling 


On AN Sw 


Of these eight principles, the test items 
were concerned chiefly with numbers 1, 2, 7, 
and 8, and to a lesser degree with number 4. 
Those principles which do not appear in the 
test items were regarded as important in the 
experimental method; therefore, it may be 
that there were some values of the experi- 
mental method which could not be revealed 
in a comparison of scores on these tests. 








te 
te 


No pencil-and-paper test can give a very 
satisfactory picture of the actual behavior of 
individuals in all kinds of situations. Obvi- 
ously, the mere fact that an individual dem- 
onstrates ability to make logical deductions 
and to discriminate between those aspects of 
problems which are pertinent and those which 
are irrelevant on a test of this kind is no 
guarantee that such an individual will do the 
same thing in his everyday affairs. On the 
other hand, if an individual is incapable of 
distinguishing between logical and _ illogical 
conclusions and relevant and irrelevant state- 
ments on this test, it is unlikely that he will 
make such distinctions in his everyday 
thinl.ag. 

In order to get some notion of the relation- 
ship between the two reasoning tests which 
were used as the initial test and the final test, 
these tests were given to pupils in four classes 
from schools in two cities. The classes were 
selected as representing the same kind of 
pupils as those included in the experiment. 
Of the ninety-seven pupils who took both 
tests, sixty-one were geometry students and 
thirty-six were not. The tests were taken on 
consecutive days in three classes. In the other 
class both tests were taken on the same day. 
The initial test was given first in two classes. 
There was no measurable practice effect. The 
pupils who took the final test first had just 
as much gain from initial test score to final 
test score as did the pupils who took the tests 
in the other order (in fact, an average of two- 
tenths of a point more). The relationship be- 
tween the two tests is shown in Table VI. 


TABLE VI 


RELATIONSHIP BETWEEN SCORES ON INITIAL 
AND FINAL REASONING ‘l'ESTS WHEN 
TAKEN TOGETHER 


Initial Test Final Test 
32.1 36.9 
S.D. 13.3 15.1 
Coefficient of Correlation be- 

tween Scores on the Two 

ME Snianatehcinaiicntendsaatieannant 77 


The final test had 110 possible points, as 
compared with 70 possible points on the in- 
itial test. The results of Table VII show that 
in general a score on the initial test is equiv- 
alent to a final test score of about 4.8 points 
higher. Although it should not be inferred 
that this difference is the same at all points 
on the scale, the increase from initial test 
score to final test score may be used as a 
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TABLE VII 


DISTRIBUTION OF SCORES ON THE INITIAL AND 
FINAL REASONING TESTS IN THE THREE 
EQUATED GROUPS 








.& & . & & 
Le 6 6 i'd > << 
sk i= 7 kh i= - 
@ mh & ec a £ 
bs bs & 8s fs & 
&e et . & & ‘ & & 
‘+ + oa ‘+ Rt . 
cc ce Oo ec ce = 
2 Ps: & o Pas & & ° & Pas & Pa 
199-109 
on ° 
- 4 2 
-7 1 2 3 13 4 
- 3 4 6 9 26 § 
50-59 12 22 16 23 42 87 
40-49 “4 $3 38 48 59 68 
30-32 93 79 91 109 83 35 
20-29 115 103 111 109 71 14 
10-19 69 59 53 26 28 
O-0 4 9 13 3 2 
—.  e 330 330 330. +~ + 330 330 330 ~ 
ean 29.0 30.9 30.2 33.9 40.2 56.9 
(with PE) (2.4) (@.5) (.5) @ 5) @.6) @.6) 
SeDe 11.2 13.0 12.9 12.4 17.0 17.2 





fairly reliable index of the gain in reasoning 
ability when groups of students are being 
considered. 

The results of the three equated groups on 
the reasoning tests are given in Tables VII 
and VIII. If allowance is made for the fact 
that the final test scores are about 4.8 points 
higher than equivalent scores on the initial 
test, the results of Table VIII show that there 
was no measurable gain in reasoning in the 
Non-Geometry Control Group, a very slight 
gain in the Geometry Control Group, and a 
marked gain in the Experimental Group. 


TABLE VIII 


DIFFERENCES BETWEEN THE MEANS OF INITIAL 
AND FINAL TEST SCORES FOR EACH OF 
THE THREE EQUATED GROUPS 


Difference 
Mean of Final Test Scores 
Minus Mean of Initial Test 


Group Scores (with PE diff.) 
Non-Geometry Control 

ae 4.9 (+ .6) 
Geometry Control 

ee 9.3 (+ .8) 
Experimental Group_- 26.7 (+ .8) 
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As might have been expected, there were 
differences among the results obtained by the 
teachers within the Experimental Group and 
also within the Geometry Control Group. 
[hese results are shown in Table IX. This 
table shows that every experimental teacher 
secured results, as measured by these tests, 
superior to those secured by any control 
teacher. Furthermore, the only marked in- 
terval in the column of numbers under “Mean 
Gain” is the one separating the experimental 
teachers from the control teachers. 


TABLE IX 


ResULTs OBTAINED BY THE TEACHERS WITHIN 
Tue EXPERIMENTAL GROUP AND THE GEOM- 
ETRY CONTROL GROUP AS MEASURED BY THE 
REASONING TESTS 








eacher School Number of Ikmber of liesn ean Jean Gain 
Classes Pupils Ace IQ. (ith PE) 
Vv 2 48 15-7 99.7 29.1 (#1.5) 
= VI 1 27 16<4 114.9 25.3 (2.1)° 
I 2 65 1S-5 108.2 25.) (1.0) 
vI 2 53 16-1 118.2 24.3 (#1.4)* 
vI 4 100 16-5 110.4 24.0 (80.8)° 
r- Iv a 106 16-2 113.1 22.7 (@1.0)° 
pH mI 2 41 15-9 106.5 21.3 (#1.0) 
E- II 2 52 l6el 106.1 21.2 @1.5) 
e- Til 1 23 15-8 103.2 21.1 (1.5) 
- 1t1 1 27 15-6 107.3 20.0 (#2.0) 
- II 3 76 157 114.1 13.2 (£1.?) 
II 1 30 1Ge2 104.0 11.4 (@1.5) 
Iv 3 6S 16-6 109.7 9.6 (@1.1)* 
II 1 24 15-6 109.4 6.2 (#1.3) 
vI 3 79 16-2 117.5 6.9 (#1.0)° 
VII 1 15 15-11 117.0 409 (@4.1)° 
I 1 28 15-7 1026 4-3 (£1.6) 
Ill 2 47 15-7 104.2 3.9 (1.4) 





“The gains of 11th grade classes in Table 
IX are marked with asterisks; all other gains 
are for 10th grade classes. 


_ In attempting to analyze some of the dif- 
ferences which appear within the groups in 
lable IX, use is made of the following data: 


1. Records of verbal statements made by 
the teachers before the experiment began 
and also in the course of the experiment 
concerning previous experience with the 
experimental method, amount and nature 
of preparation made for the semester’s 
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work, the progress of their classes in the 
experiment, etc. 

2. Statements of a similar nature made by 
teachers in letters. 

3. Statements obtained from questionnaires 
at the end of the semester. 


Among the experimental teachers there was 
much variation both in the amount of time 
devoted to non-geometric applications and in 
the amount of reference material used. All of 
these teachers made use of the Thirteenth 
Yearbook of the National Council of Teachers 
of Mathematics and Clear Thinking by 
Schnell and Crawford, as well as the manual 
referred to earlier in this article. 

The two experimental teachers (E-7 and 
E-—10) whose classes made the highest gains 
attended a summer school course on logic in 
geometry during the summer preceding the 
experiment. The experimental teacher (E-—5) 
whose class made the lowest gain was the last 
teacher to enter the experiment and had the 
smallest amount of preparation for the study. 
Teacher E—4 spent the summer preceding the 
experiment at a work shop of the Progressive 
Education Association, preparing materials to 
use with her experimental classes. Teacher 
E-2 remarked a number of times during the 
experiment that he was not spending as much 
time on non-geometric applications as he had 
expected. 

The teachers of geometry control classes 
were requested to answer the following four 
questions at the end of the semester: 


1. In introducing the idea of proof, did you 
use such non-geometric material as 
syllogisms ? 

2. If you used indirect proof in geometry, 
did you use illustrations outside of 
geometry to make the meaning of in- 
direct proof clear? 

3. Do you recall any other places in the 
course where non-geometric material 
was used for purposes of illustration? 

4. Do you feel that you have succeeded 
pretty well in avoiding the considera- 
tion of applications of logical thinking 
in non-geometric situations? 


The two control teachers (C—3 and C-2) 
whose classes made the highest gains in the 
Geometry Control Group gave affirmative 
answers to the first three questions. Teacher 
C-3 give the following answer to Question 3: 
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“Observation of geometric forms in 
everyday life. Have pupils keep account 
for a week or more of any uses they observe 
for applications of their knowledge of 
geometry—until it becomes more or less 
habitual.” 


This teacher did not answer Question 4, 
possibly believing that her answer to Ques- 
tion 3 had indicated a negative answer to 
Question 4. Teacher C—2 (who was also E-2) 
said that he felt that the experimental method 
had influenced his teaching in the control 
class to such an extent that he did not expect 
much difference between the results in his 
experimental and control classes. Teachers 
C-6, C-5, and C—7 reported that they made 
some applications of the kind of thinking 
done in geometry to non-geometric situations. 


In addition to the data obtained through 
the use of the reasoning tests, other facts were 
observed and recorded by the teachers con- 
cerning changes in the pupils’ behavior in the 
direction of improved thinking habits. Al- 
though the writer is aware of the limitations 
of such data for purposes of evaluation, he 
believes that the observations and reactions 
of the teachers who use a teaching method 
with their classes should form an important 
factor in evaluating that method. 


All of the experimental teachers reported 
that the pupils enjoyed the work and most 
of them indicated that the pupils had ex- 
pressed themselves as believing that the work 
was of value to them. Nearly all of the teach- 
ers reported that pupils had contributed un- 
solicited illustrations of clear thinking or the 
lack of it in many different fields. Some of 
the teachers reported large numbers of such 
contributions. Most of the teachers observed 
that their pupils were more critical in their 
thinking in geometry than had been the case 
in previous years. Three teachers reported 
that teachers of other subjects had commented 
on the improvement in thinking habits of cer- 
tain pupils in the experimental group. There 
were but two references to comments by par- 
ents concerning the improvement in thinking 
of pupils. However, most of the teachers 
added that they had not sought such com- 
ments from other teachers or parents. One 
teacher reported that she did not get as much 
voluntary pupil activity as she desired. 
Nearly all of the teachers indicated that they 
expected to continue with this method the 
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rest of the semester and that they intended 
to use it again next year. 

Following are some other comments made 
by individual teachers: 


“One boy frequently referred to his 
chemistry textbook as having descriptions 
which closely resembled geometric proofs.” 

“We always discussed the assembly 
speakers from the standpoint of ‘if-then’.” 

“All but one of the pupils in the class 
definitely requested to be allowed to stay in 
the class which was studying clear think- 
ing.” 

“Never a question of ‘why do we have 
to take geometry’.”’ 

“One boy said his work in reasoning 
helped him with his Latin.” 

“The method developed an awakened 
consciousness as to the importance of the 
meaning of words.”’ 

“It improved reading and oral expres- 
sion.” 

“Whether it shows up on the tests or 
not, I am sure that it has helped my 
students.” 


Each of the three equated groups was sepa- 
rated into low, average, and high ability divi- 
sions by placing in the first, pupils with I.Q.’s 
of less than 100, in the second, pupils with 
1.Q.’s from too to 119, and in the third, 
pupils with 1.Q.’s of 120 or higher. The low- 
est division contained about 18 percent o/ 
the pupils within each group, the middle 
about 60 percent, and the highest about 22 
percent. The mean gains of these divisions 
are shown in Table X. 

There did not appear to be any very satis- 
factory way of obtaining a comparison be- 
tween the Experimental Group and_ the 
Geometry Control Group with respect to 
achievement in geometric subject matter. 
This was due to the lack of uniformity in the 
content of the first semester geometry course 
in different schools and even in different 
classes within the same school. The sequence 
of theorems varied from teacher to teacher 
and, although the content of the course for 
the entire year would be about the same in al! 
classes, there was no uniformity at the end 
of the first semester. This made the giving 
of the same geometry test to all classes vir- 
tually impossible. 

Some evidence as to achievement in 
geometry was desired; therefore, each experi- 
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TABLE X 


Mean GAINS IN REASONING TesT Scores FoR LOW, AVERAGE, AND HIGH ABILITY DIVISIONS OF 
EACH OF THE EQUATED GROUPS 


Low Division 
(1.Q.’s Less 
than 100) 


Group 
Experimental Group 
Geometry Control Group 
Non-Geometry Control Group 


mental teacher was asked to give tests over 
geometric subject matter and to answer the 
following question at the end of the semester: 


“Do you believe that there was any loss 
in the understanding of geometric relation- 
ships because of the attention directed to 
clear thinking outside of geometry? (Since 
the results on your semester geometry ex- 
amination probably will be an important 
factor in determining your answer, we sug- 
gest that you wait until all of your testing 
is done before you answer this question.)”’ 


Nine of the ten teachers answered “No”’ to 
the question and the other stated that she be- 
lieved there was little difference. Five teach- 
ers added that their pupils had shown a bet- 
ter understanding of geometric relationships. 
One teacher said that his classes had done 
better on virtually every test during the 
semester than had other classes on the same 
tests. Another teacher said that, although 
she regarded her experimental class as poten- 
tially her weakest, the pupils in that class 





High Division 
(1.Q.’s of 120 
and Higher) 
30.7 
13.4 
4.0 


Middle Division 
(1.Q.’s From 
100-119) 


25.2 
3 
l 


“were less affected by habit, they read the 
questions more critically, and concentrated 
better in answering them.” One teacher sent 
a description of the semester test she had used 
and added that in her judgment the work on 
reasoning had increased her pupils’ under- 
standing of geometric relationships. 


CONCLUSIONS 


The results of this study indicate that it is 
possible for high school geometry teachers, 
under normal classroom conditions, to teach 
in such a way as to cultivate reflective think- 
ing, that this can be done without sacrificing 
an understanding of geometric relationships, 
and that pupils at all I.Q. levels are capable 
of profiting from such instruction. The results 
also indicate that even what is commonly re- 
garded as superior geometry teaching has 
little effect upon pupils’ behavior in the direc- 
tion of reflective thinking unless definite pro- 
visions are made to study methods of thinking 
as an important end in itself. 











THE RELATIVE CONTRIBUTION OF CERTAIN FACTORS 
TO INDIVIDUAL DIFFERENCES IN ALGEBRAIC 
PROBLEM SOLVING ABILITY* 


WyLtMa RosE KELLAR 
Boston College 


INTRODUCTION 


The solution of a verbal problem in algebra 
may be considered as a single activity only 
so far as it is the summation and resultant of 
a group of component abilities. The nature of 
the human personality is such that achieve- 
ment in any given field necessarily involves a 
combination of various factors. 


In conjunction with educational technique 
it would be of practical value to know what 
these factors are and the relative importance 
of each in determining the level of achieve- 
ment. With such information classroom in- 
struction could more readily emphasize essen- 
tial elements. 


The present investigation was undertaken in 
an effort to determine the specific abilities or 
factors in a pupil’s mental equipment which 
accompany or affect his achievement in the 
solution of verbal problems in high school 
algebra. The factors which have been consid- 
ered in this study include the following: 


I. Intelligence 
II. Reading 
III. Memory 
IV. Arithmetic computation 
V. Arithmetic problem solving 
VI. Technical vocabulary of algebra 
VII. Fundamental operations of computa- 
tion in algebra 
VIII. Analysis of verbal problems in 
algebra 


While exhaustive bibliographies may be 
compiled for research which has been carried 
on in regard to algebra, it is noteworthy that 
the studies dealing specifically with the solu- 
tion of verbal problems in algebra are few in 
number. Various prognosis studies have been 
made in this field. They, however, deal with 
general achievement (including computation) 


* A dissertation submitted to the faculty of the Graduate 
School of Arts and Sciences of the Catholic University of 
America in partial fulfillment of the requirements for the 
degree of Doctor of Philosophy. 
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in algebra and are not applicable to the more 
restricted study of the solution of verbal 
problems in algebra. 


McLeod and McIntyre (14) note that 
verbal problems are the chief stumbling-block 
in algebra. Although presenting no experi- 
mental evidence, they suggest various sources 
of difficulty. These include the necessity for 
pupil responsibility and ingenuity rather than 
any stereotyped method of approach. “The 
pupil must select, translate, relate—all in one 
question.” Rote memory or the application of 
any one general rule will not suffice. Rather, 
the formation of the essential equation will 
require clear and perhaps original thinking. 
Again, the language of algebra may not be 
adequately understood. Thus the facts of the 
preblem may not be properly translated in the 
establishing of the equation itself in correct 
algebraic form and content. Ignorance of, or 
carelessness in the solution of the equation 
may result in error. And, finally, solutions are 
rarely checked against the original conditions 
of the problem, a process which would fre- 
quently show the absurdity of an obtained 
answer. 

Clem and Hendershot (6), following a de- 
tailed analysis of the errors of 80 ninth grade 
pupils on a test of verbal problems in algebra, 
concluded that the most common causes of 
failure are lack of preparation and knowledge 
of technique on the part of the teacher; on 
the part of the pupils—inability to read, lack 
of any systematic approach, lack of know!- 
edge of arithmetic, and lack of proper 
checking. 

Hawkins (9) likewise stresses the need for 
precise reading, the ability to express in alge- 
braic symbols the words and phrases of the 
problem, the ability to recognize the relation- 
ship involved, to form the necessary equa- 
tions, and to solve them. Hawkins demon- 
strates that while practice in translating lng: 
lish expressions into algebraic symbols may 
not raise test scores, yet training in problem 
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analysis has a favorable influence on achieve- 
ment in the solution of verbal problems. 

Stright (17) found that training in the 
reading of algebra problems led to a signifi- 
cant increase (Critical Ratio == 3.09) in the 
correct solution of verbal problems. 

Buckingham (3, 2) reports a correlation of 
25 .05 between vocabulary test marks and 
scores on Part II of the Cooperative Algebra 
Test which is composed of algebra verbal 
problems. Correlations ranging from .21 + .07 
to .39 -+ .06 were found between algebra 
verbal problems and reading as measured by 
the Cooperative Algebra Test and the sections 
of the Gates Silent Reading Test. 

He concludes that the correlations are so 
low that there must be other necessary abil- 
ities present to attain achievement in algebra 
and notes that while such correlation coeffi- 
cients have little significance if used alone, 
when used as measures of a single factor 
among several factors they indicate enough 
relationship to be considered. 


While previous studies point in a general 
way to certain factors related to this problem, 
further investigation in this field will prove 
of value. 


EXPERIMENTAL PROCEDURE 


The Tests 

In selecting the tests to be used in this 
experiment, an attempt was made to measure 
those factors which, from a subjective analy- 
sis, would be expected to influence ability in 
solving algebra verbal problems. The battery 
which was devised included the following: 

Intelligence: Exercises in Cognitive Ability 
-Form A, as developed by Sr. Maurice 
McManama (15), was employed as a 
measure of general intellectual ability. The 
test is composed of five sections—Discrimi- 
nation, Analogy, Completion, Definition, and 
Proverbs—all of which have been shown to 
fulfill the three requirements of (a) approx- 
imate satisfaction of the tetrad criterion, 
(b) low residual correlation when the general 
factor is partialed out, and (c) high correla- 
tion (.989) with the underlying general 
factor. The first and second show that there 
is a general factor running through the group 
of tests; the third indicates that the tests 
measure this factor with a considerable degree 
ot accuracy. The reliability of the complete 
scale varies from .94 to .98. This intelligence 
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test was selected, in part, because no section 
of it contains any type of mathematics. Thus 
it gives a measure of cognitive ability which 
prevents overlapping with any mathematical 
factor under study. 

Reading: Iowa Silent Reading Tests, Ad- 
vanced Test, Form A. 

Memory: Fable Completion and _ Prose 
Tests were shown by Monaghan (16) to have 
high correlations, .812 and .785 respectively, 
with a general memory factor. The reliability 
coefficients are .854 and .735. These two tests 
were used as measures of memory. 

Arithmetic: Unit Scales of Attainment, 
Form A, Grades VII-VIII. The test consists 
of two sections, one containing 25 problems in 
fundamental operations, the other containing 
25 verbal problems in arithmetic. 

The four tests dealing with algebra, devised 
by the writer, are based on a study of repre- 
sentative textbooks and such standardized 
tests as were applicable. 

Technical Vocabulary of Algebra: The 
Vocabulary Test, First Year Algebra, Form A 
consists of sixty questions of the multiple 
choice form as devised by the writer (12) in 
1937 when an item analysis was made for 
validity and difficulty. The type of question 
included is illustrated by the following item 
from the test. 


Variation means ( ) 
(1) independence (2) constancy (3) 
change (4) equality (5) uniformity 


Fundamental Operations in Algebra: The 
test consists of fifty problems in computation 
of the following form: 


1. Evaluate 2(4—6) #$+™|\}................ 1. 
2. State the product of 

ee i  newsneeecnnanesiinngh 2. 
3. Expand (4x—S8y)* -—-------~------- 3. 


Analysis of Verbal Problems in Algebra: 
The test consists of fifteen verbal problems, 
each of which was followed by multiple choice 
statements in the following manner. 


PROBLEM: What number is that which in- 
creased by 5/9 of itself equals 28? 


A. WHICH OF THE FOLLOWING FACTS 
ARE GIVEN IN THE PROBLEM?___ ~~ ( ) A. 
(1) A number is decreased by 5/9 
of itself. (2) A number equals 
28. (3) A number is increased 
by 5/9 of itself. (4) 5/9 of a 
number equals 28. 
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B. WHICH OF THE FOLLOWING FACTS 
ARE You ASKED TO FIND OUT IN 


_ f, | een ( ) B. 
(1) What number equals 28. (2) 
What number which when 
added to 5/9 of itself equals 
28. (3) Why 5/9 of a number 
equals 28. (4) What 2 num- 
bers equal 28. 
C. WHICH OF THE FOLLOWING EQua- 
TIONS SHOULD BE USED IN SOLVING 
THE PROBLEM? ___ _--_-- ea ) C. 
Let x = the number 
(1) (x+ 5x)/9 = 28 
(2) y+ 5x/9 = 28 
(3) 5x/9 = 28 
(4) x+ 5x/9 = 28 
D. WHICH OF THE FOLLOWING IS THE 
Most REASONABLE ANSWER? ____- ( )D. 
(1) 5/8 of 18 (3) 18 


(2) 21 (4) 5/9 of 28 

Verbal Problems in Algebra: The criterion 
was measured by two forms, A and B, of 
objective questions based on verbal problems. 
The type and difficulty of these problems were 
selected in compliance with the findings of a 
study by Varnhorn (19). Each test consists 
of fifteen problems of which the following is 
an example. 

A man walks 9 miles, then travels a certain 


distance by automobile, and twice as far by 
train. The whole trip is 108 miles. 


(a) How many miles does he 


go by automobile?  —__-____-__ a. 
(b) How many miles does he 
Sass ~~ ——--— abbineniaiban b. 


The time limits and scoring for the Read- 
ing, Memory, and Arithmetic tests complied 
with directions as stated in their development. 

The time limits for the Intelligence, Alge- 
bra Computation, Algebra Analysis, Algebra 
Vocabulary, and each form of the Algebra 
Verbal Problems tests were 40-45 minutes, 
ie., one class period. 

All algebra tests were scored by giving one 
credit for each space correctly filled. 
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The Subjects 


The test battery was administered in 
schools of Detroit, Michigan, during the last 
two weeks of May, 1938. Four hundred ninth 
grade boys and girls in nine parochial high 
schools participated. The schools were the 
following: 


. Annunciation High School 

. Our Lady of Lourdes’ High School 
St. Agnes’ High School 

St. Cecelia’s High School 

. Charles’ High School 

St. Gregory’s High School 

St. Martin’s High School 

St. Mary of Redford High School 
St. Rose’s High School 


{2.90 I> OT OPO ps 
eZ) 
co 


All tests were presented by the teachers o/ 


. the classes in accordance with explicit direc- 


tions contained in the test booklet. The test- 
ing program extended over a period of two 
weeks. The sequence of presenting the tests 
was the same in each school, and as far as 
possible the interval between tests was con- 
stant for all schools. 


The results to be used for statistical analy- 
sis were confined to the scores of those pupils 
who received every test of the battery. Due 
to this rule the original number of 400 sub- 
jects was reduced to 284. 


STATISTICAL ANALYSIS 


Chronological age measures are given in 
Tabie I. 


Such age measures are typical of ninth 
grade pupils (1,16). To determine whether 
the factor of age affected the scores to any 
appreciable degree, the ages of the subjects, 
expressed in months, were correlated with test 
scores. The resulting correlations approxi- 
mated zero. 


Since all the pupils in the algebra classes 


of nine high schools were used as subjects, it 


TABLE I 


CHRONOLOGICAL AGE OF SUBJECTS 











Months _ 

Standard 

N Sex Age Range Mean Deviation 
102 0 a ee ee ee 159 to 204 180.62 8.64 
182 RIT, ei:cesennktncicssion si sce cuaniaiedeleaeneacebeebebinaiaanal 158 to 197 178.73 6.54 
Total sossthandnibeststitnietnaniniay elaine aa 179.40 7.18 





ee ee 
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TABLE II 


MEANS AND STANDARD DEVIATIONS OF TEST SCORES 

















Total Work- 
ing Time Possible Standard 
Test in Minutes Score Mean Deviation 
I ESS ee 90 60 30.24 10.19 
29. Algebra Computation ~......----------- 45 50 13.11 9.22 
OS Ea yxyE——EE 45 60 31.65 6.67 
TE I ets oe 45 60 29.62 8.53 
5, Arithmetic Problems ................... 20 25 17.76 4.72 
6. Arithmetic Computation ____-_---_-----_- 20 25 11.64 5.84 
DI  eegeis eee one 45 240 133.06 25.70 
i elec evcireius apaquenaciceaptndintaespilieibite 45 234 101.69 31.02 
PE Se iiicaiies cecotnnesaidiain nett 15 20 12.18 4.65 
eo eS eee ree eee 30 110 90.40 11.54 
TABLE III 
INTER-CORRELATIONS OF TESTS* 
1 2 3 4 5 6 7 8 9 10 
1. Algebra Problems -__--------~- 698 .593 .476 .610 .515 .510 .506 .428 .3238 
2. Algebra Computation ____---~~ 021 .576 .878 .529 .727 .402 .3881 .3842 .242 
3. Algebra Vocabulary ___------~- .026 .027 826 .472 .422 .561 .556 .441 .390 
4. Algebra Analysis —_..__.._--- .031 .034 .036 .3865 .208 .427 .421 .3878 .149 
5. Arithmetic Problems ___-----~ .025 .029 .031 .035 594 .445 .482 .240 .214 
6. Arithmetic Computation _____- .029 .019 .033 .038 .026 485 .431 .843 .377 
eee Seas .030 .034 .027 .033 .082 .031 .758 .418 .457 
2 ES ee eee .030 .034 .028 .033 .031 .033 .017 4386 .466 
9. Memory: Pree .............. .033 .085 .032 .034 .038 .035 .033 .032 423 
10. Memery: Fags —........... 036 .088 .034 .039 .0388 .034 .032 .081 .033 





* Upper right figures are correlation coefficients; lower left figures are probable errors of 


the corresponding coefficients. 


would seem that the group is representative 
of pupils enrolled in first year high school. 
Any departure made from representative sam- 
pling is insufficient to bias the results. 

Table II presents the means and the 
standard deviations of scores for each test. 

The reliability of the criterion test was 
determined. Intercorrelation of forms A and 
B of the Algebra Verbal Problems resulted in 
a coefficient of .6749. Since the scores used in 
the later computations were the sums of both 
forms, this coefficient was stepped up accord- 
ing to the Spearman—Brown formula, giving 
a final coefficient of reliability of .8058. 

The scores for all tests were intercorrelated 
(Pearson). The intercorrelations, together 
with their respective probable errors, are 
given in Table ITI. 

All of the correlations are positive and in 
all cases except one (r,,9 == .149 + .039) the 


coefficients exceed four times their probable 
errors. 

The coefficient of multiple correlation 
(R, ssase7s010) Was found to be .8057 + .or4. 
The beta weights as arranged in their rank 
order are as follows: 


TABLE IV 


BETA WEIGHTS ARRANGED IN RANK ORDER FOR 
NINE INDEPENDENT VARIABLES 





Test Beta Weight 
2. Algebra Computation —_____-_-_ 5091 
5. Arithmetic Problems —_---~-~-- .2965 
DSS IID ee tein .1030 
RO .0950 
4. Algebra Analysis ~.__-_----- .0924 
38. Algebra Vocabulary ~__.----~ .0738 
10. Memory: Fables ~___---~-~--- .0705 
i 2 |) eee .0260 
6. Arithmetic Computation ___~-~- —.2006 
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No correlation technique can show the 
direction of causation. Which of two or more 
variables are causes and which are effects 
must be determined by logical analysis or by 
experiment. In the present study, for the sake 
of analysis, it is assumed a priori that prob- 
lem solving achievement is contributed to by 
all the variables under consideration and by 
unknown causes. This seems to be a reason- 
able assumption. 

The coefficient of correlation between the 
criterion and any other variable is not an 
index of the contribution of that one variable 
to problem solving ability, because the group 
is heterogeneous with respect to all other 
variables as measured. Partial correlation is 
often used to eliminate such heterogeneity. 
This technique, however, “partials out” too 
much when it renders constant factors which 
may in part or in whole be caused by either 
of the two factors whose true relationship is 
to be measured (4). Partial correlation tech- 
niques are inadequate for the present prob- 
lem, since the variables are both causes and 
effects of each other. 

The path coefficient technique (20, 21, 10, 
11, 5) provides for the measurement of both 
direct and indirect influences. The use of co- 
efficients of determination leads to an analysis 
of variance (18, 8), showing the component 
percentages contributed by each known inde- 
pendent variable to the variance of the 
dependent variable. 

A path coefficient (p,,) is defined as the 
ratio of that part of the standard deviation 
of a variable due to another variable, to the 
total standard deviation of the variable. 

S.D. of variable 1 due to jth variable 


P..1 total S.D. of variable 1 


When only two variables are considered, 
the path coefficient is equal to the coefficient 
of correlation between the two variables. 
When several variables are involved, the co- 
efficient of correlation between any two of 
them is equal to the path coefficient of the 
path directly connecting them plus the prod- 
uct of the path coefficients of the paths indi- 
rectly connecting them. Paths leading to and 
through the dependent variable are not con- 
sidered. According to this theorem, a series 
of simultaneous equations may be constructed 
involving the intercorrelations and the path 
coefficients. The intercorrelations being 
known, the solution of the equations gives the 
values of the path coefficients which in turn 
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form the basis for the coefficients of cet 
mination. 

The coefficients of determination measur. 
ing the direct influences are equal to thy 
square of the path coefficients of the path 
leading directly from each of the independent 
variables to the dependent 
fundamental equation is 


d,.2==P*1.2 


The coefficients of determination measur- 
ing the indirect or combined influences are 
equal to twice the product of the two paths 
from the given pair of independent variables 
multiplied by the coefficient of correlation 
between the two independent variables. The 
fundamental equation is 


‘ii 2Pi2Pislos 


The sum of the direct and indirect coefii- 
cients of determination equals the total vari- 
ance in the dependent variable which is due 
to the known independent variables. The de- 
termination of the unknown causes may be 
represented by d, x. The summation of all 
influences is equal to unity according to the 
fundamental equation: 

> d,. + a és _ a; =I 


where Sd,. includes all direct and 3d,> 
includes all indirect coefficients of determina- 
tion. Sd,,.+23d,., is equal to the coeffi- 
cient of multiple determination, or the square 
of the coefficient of multiple correlation be- 
tween the dependent variable and the known 
independent variables (13). d,x is the co- 
efficient of multiple alienation. It is also 
equal to the difference between unity and the 
sum of the coefficients of determination. 

When the indirect influences are separated 
according to the weights indicated by the co- 
efficients of direct determination, they may be 
combined with their respective direct coefti- 
cients to give the total percentage of variance 
in the dependent variable which is due to each 
independent variable. 

The nature of the technique is such that its 


degree of complexity increases rapidly with | 


each additional variable (13). The calcula- 
tion of path coefficients for the ten variables 
of the present study would involve the solu- 
tion of forty-five equations in forty-five un- 
knowns. While the solution is theoretically 
possible, the computational difficulties woul? 
be of such magnitude as to render it imprac- 
tical. To avoid such complications it was (e- 


variable. The 


te cote ved pales ' 
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termined to apply the technique to the vari- 

in sets of four. While the ten variables 

‘cht be arranged in two hundred ten such 

hinations, it was realized that many of 

these groups would be of little value due to 

the fact that they would account for a rela- 
‘ively small percentage of variance. 

By reference to Table IV it is noted that 
the beta weights for two of the variables, 
algebra computation and arithmetic problem 
solving, are several times as large as any 
other beta weight. Thus, as a preliminary 
measure, multiple correlation coefficients and 
beta weights were found for the criterion, 


1] 
iDie> 


PROBLEM SOLVING 


ABILITY 3I 





algebra problem solving, algebra computation 
and arithmetic problem solving as combined 
with each of the remaining variables in turn. 
The results are given in Table V. 

It can be noted that the multiple corre- 
lation coefficients are quite similar in size and 
that the beta weights fall into definite group- 
ings. That the effect of intelligence might be 
noted, multiple correlation coefficients and 
beta weights were found for algebra problem 
solving, algebra computation, arithmetic 
problem solving and intelligence as combined 
with each of the remaining variables in turn. 
The results are given in Table VI. 


TABLE V 


MULTIPLE CORRELATION COEFFICIENTS AND BETA WEIGHTS OF ALGEBRA PROBLEM SOLVING, 


ALGEBRA COMPUTATION, 


ARITHMETIC PROBLEM SOLVING AND ONE OTHER VARIABLE 











Multiole Reta, Beta,; Beta, Beta, eto, Bet “By Betas, Be ter 19 bete 

farrelat P , . -_ ra a 

cectticient gee Mel Tce gues ae Weekes Ries | Gemma 
tation Problems alvsis lary tetion 

Ry} 259 = °775 465 318 -194 

Ry 057 = +773 475 «270 -199 

Ry 054 = +773 .472 .290 -192 

R052 = 773 ~424 285 215 

Ry 05g = -772 487 258 196 

1.08102 +764 497 71a 134 

Ry osg = -760 -608 2273 -.151 

TABLE VI 


MULTIPLE CORRELATION COEFFICIENTS AND BETA WEIGHTS OF ALGEBRA PROBLEM SOLVING, 
ALGEBRA COMPUTATION, ARITHMETIC PROBLEM SOLVING, 
INTELLIGENCE AND ONE OTHER VARIABLE 








Multiple Betajo Beta)s Beta,y Betajs Betajig Beta), Beta a Beteyio 
Correlation . Arithmetic Memory y algette Alge Pe Reeatoe Memory 
Coeffictent _— —_ = Compu- Prose An- Vocabu- Fables 
P i 
tation Problems tetion alysis lary 
R) 2576 = -786 -595 2323 236 -.224 
RB) 2579 = .784 442 274 -148 -150 
2574 = 784 446 250 155 -150 
Ri os7z = -781 419 252 145 151 
2578 2 -776 2473 -252 126 -108 
1.25710 = «776 469 272 -166 075 











A noticeable degree of consistency is dis- 
played in Tables IV, V, and VI with regard 
to the rank order of importance of the various 
factors. The negative weight attached to 
arithmetic computation may be indicative of 
the opposing elements found in arithmetic 
computation and in algebra. An important 
example of negative transfer may be cited in 
that the use of signed numbers may change 
the response in each of the four fundamental 
operations. Apparently intelligence tends to 
increase individual differences with respect to 
arithmetic computation, i.e., while the posses- 
sion of intellectual insight eliminates the in- 
terference caused by signed numbers, such in- 
terference produces increasing difficulties for 
the dull child. Evidence for such a conclu- 
sion is found in the change of the rank of the 
multiple correlation coefficient from seventh, 
or last, in Table V (intelligence excluded) to 
first in Table VI (intelligence included). 

Since the coefficient of multiple determina- 
tion is equal to the square of the multiple 
correlation coefficient, that set of four vari- 
ables producing the highest multiple correla- 
tion coefficient should be used in the applica- 
tion of the path coefficient technique. 


From Table V it would appear that this 
group would include algebra problem solving, 
algebra computation, arithmetic problem 
solving, and memory for prose. However, in- 
spection of the test for memory of prose 
rendered its use questionable. It had not been 
noted previously that the answers to thirty- 
five per cent of the questions involved figures. 
Such a fact leads one to suspect that a special 
numerical element is the cause of the high 
weight of this test. The loading of a memory 
test with such a number element would prob- 
ably lead to-a spuriously high beta weight. 
For this reason the test was discarded from 
further consideration. 

Path coefficients were then found for the 
group including algebra problem solving, 
algebra computation, arithmetic problem solv- 
ing, and intelligence. 

In Figure I, the rectangles 1, 2, 5, and 7 
represent the variables, algebra problem solv- 
ing ability, algebra computation ability, 
arithmetic problem solving ability, and intel- 
ligence as measured by the test scores. X 
represents unknown factors influencing alge- 
bra problem solving ability. The small letters 
represent the path coefficients. The arrows 
represent the direction of influence as assumed 
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FiGURE 1. Variables and Paths of Influence. 


a priori. On the basis of these assumptions 
the following equations were derived. 


.698 = rx = a+ bd+ cf + cde + bef 
.610 = rs = b + ad+ ce + aef 

.510 = rn =e + af + be+ ade 

.529 = rs = d+ ef 


.445 = ree 
.402 = rn = f + de 
v= V1—zd 


Solution of these equations gave the fol- 
lowing values for the path coefficients: 


a= .475 = .436 
b= .270 e= .445 
c= .199 f = .208 


v= .634 


The values of the coefficients of determina- 
tion were secured from the following equa- 
tions based on the path coefficients. 





di: =a’ = .2256 
dis = bd? = 0729 
ds =¢ = .0396 
diss =2abrs = .1857 
dizx—2acr: = .0760 
dy ss 2bers: => .0478 
ds. =-1—Zd= .4024 

1.0000 


The sum of the direct and indirect coefii- 
cients of determination (Sd) represents the 
total determination of 1 due to factors 2, 5, 
and 7. The square root of this sum should 
equal R, ..;. 


= d= .5976 
V =d=.7731 
Ri. = .7731 


This serves as a check on the accuracy 0! 
the computation and exemplifies the relation- 
ship of the two techniques. 
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ALGEBRAIC PROBLEM SOLVING ABILITY 


The indirect or combined influences were 
separated on the basis of the relative weights 
of the direct coefficients of determination in 


the following manner: 


w 
w 


ent variables in terms of the following per- 
cents: 

I. 39.29 percent of the variance in algebra 
problem solving ability is due to variation in 
algebra computation as measured. 


nw Herts = 1557) see Sores sz «1026 2. 13.70 percent of the variance in algebra 
problem solving ability is due to variation in 
arithmetic problems as measured. 
_ 3m - .3seF 0729 aie 3. 6.77 percent of the variance in algebra 
— i “3.5 ~ . “reg = “problem solving ability is due to variation in 
intelligence as measured. 
ii as 4. 40.24 percent of the variance in algebra 
LP ore, 2 OO eee = .ce¢? problem solving ability is due to the influence 
aon of other factors over and above algebra com- 
putation, arithmetic problem solving, and 
err a) 9 — ones see intelligence. 
eT Te FG F SESE FIVE =F A similar procedure was applied to the 
various groups of algebra problem solving, 
7 we algebra computation, and arithmetic problem 
— LAS > +0478 yyy O72P = 0510 solving as combined with each of the remain- 
™ , ing variables in turn. The directions of influ- 
ence as assumed @ priori were the following: 
tio ae 0396 ae Variance in each independent variable con- 
TTS ey FO ere eee = °°" tributes directly to variance in the criterion. 


Direct and indirect coefficients of determi- 


nation were combined in_ the 


manner: 
d..: = .2256 aa .1026 + .0647 = .3929 
d, 5 — .0729 + .0331 + .0310 = .1370 
d, ; = .0396 +- .0113 + .0168 = .0677 





976 


d, x = 1.0000 — .5976 = .4023 


following 


otal variance in algebra problem solving 


Variance in 2 contributes indirectly through 
4 to variance in the criterion. 

Variance in 3, 5, 6, 7, 8, and ro contributes 
indirectly through 2 to variance in the cri- 
terion. 

Variance in 5 contributes indirectly through 
3 and 4 to variance in the criterion. 

Variance in 6, 7, 8, and 1o contributes in- 
directly through 5 to variance in the criterion. 

The resulting analysis is given in Table 
VII. 


ability is attributed to the different independ- 


TABLE VII 


THe RELATIVE CONTRIBUTION OF EACH VARIABLE WHEN COMBINED WITH ALGEBRA COMPUTATION 
AND ARITHMETIC PROBLEM SOLVING TO THE TOTAL VARIANCE IN 
ALGEBRA PROBLEM SOLVING ABILITY 





% Variance 








Total Total Almebra Arithmetic Algebra Algebra Intell- Reading Memory Arithmetic 
Un- Obd- Compu- Problems Vocabu- An- igence Fables Compu- 
mown tatned tation larv alysis tation 
40.22 59.78 35.11 15.76 8.91 

40.23 $9.77 38.67 15.20 5.90 

40.24 59.76 39.29 15.70 6.77 

10.55 59.65 40.25 12.65 6.67 

41.635 58.37 39.58 16.51 2,28 

42.27 §7.73 41.92 15.20 -61 
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( ONCLUSIONS AND INTERPRETATION 


lo determine the relative contributions of 
a group of variables to ability in solving 
algebra verbal problems, a battery of tests 
was administered to 284 pupils who had com- 
pleted one year of high school algebra. From 
the intercorrelations of the test scores the 
multiple correlation coefficient was obtained 
vielding an R of 8057. 

\s an additional refinement, coefficients of 
determination were computed for groups of 
variables; algebra verbal problems, algebra 
computation, and arithmetic problem solving 
being included in every combination with 
each of the other tests taken one at a time. 
since the beta weights of these three variabies 
were consistently the highest of those 
obtained. 

The several methods of beta weights, mul- 
tiple correlation, and path coefficients indicate 
similar trends in the relative importance of 
the variables measured. 

From the coefficients of determination it 
will be seen that: 

1. From 35.11 to 41.92 percent of the vari- 
ance in algebra verbal problem solving ability 
is due to variance in algebra computation 
ability. 

2. From 13.70 to 16.51 percent of the vari- 
ance in algebra verbal problem solving ability 
is due to variance in arithmetic problem solv- 
ing ability. 

3. When the three variables of algebra ver- 
bal problem solving, algebra computation, and 
arithmetic problem solving are combined with 
each of the remaining variables in turn, the 
variance in algebra problem solving ability is 
(a) 8.91 per cent due to variance in algebra 


vocabulary ; a ; 
(b) 6.77 per cent due to variance in intelli- 


gence ; ; ' 
(c) 6.66 per cent due to variance in reading 
(d) 5.90 per cent due to variance in algebra 


analysis ; 
(e) 2.28 per cent due to variance in memory 
(f) .61 per cent due to variance in arith- 
metic computation 
\ similar trend is shown by the size of the 
multiple correlation coefficients for each of 
the above groupings, i.e., algebra verbal prob- 
lem solving, algebra computation, and arith- 
metic problem solving combined with: 
(a) algebra vocabulary yields 
(b) intelligence ‘ ..-yields 
(c) reading _— ...-yields 
(d) algebra analysis ..yields 
(e) memory - . — yields 


= io }-=B--}-=h-2) 
° 
n 
~3 


(f) arithmetic computation __yields 
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4. Approximately 40 percent of the vari- 
ance in algebra problem solving ability is du: 
to the influence of other causes and condi- 
tions. These may include aspects of the vari- 
ables unmeasured by the tests used and effort 
interest, attitude, and other elements over and 
above whatever contribution they may have 
made through the independent variables. 

The distribution of variance due to 1! 
factors of intelligence, vocabulary, analysi- 
reading, and memory points to algebra pro} 
lem solving ability as being dependent up: 
a number of integrated abilities. 

Since the beta weight for algebra computa 
tion is considerably larger than that for any 
other variable and since more than one third 
of the variance in algebra verbal problem 
solving is due to variance in algebra comput. 
tion, it is evident that facility in algebra cor 
putation is by far the most important fact: 
in ability to solve algebra verbal problems 
These findings are analogous to those «of 
Engelhart (7) who found that 42 per cent 
of the variance in arithmetic verbal proble: 
solving is due to variance in arithmetic con 
putation. 

The second most important factor is con- 
sistently that represented by the test of arith- 
metic problems. Since tests of intelligence. 
vocabulary, and problem analysis were em- 
ployed in the computation of coefficients «/ 
determination, it is not clear what psycholog- 
ical function is represented by the arithmeti: 
problem test over and above the variables 
added in this study. It appears that! in addi 
tion to the formal technique of analysis, there 
is an approach to the solving of verbal prol)- 
lems which is of a similar nature in both 
arithmetic and algebra. 

The beta weights reveal positive contribu- 
tions made by intelligence, vocabulary, read- 
ing, analysis, memory, and arithmetic compu- 
tation, although the determination of the 
unique variance of each is not feasible due tv 
the complexity of the technique of path co- 
efficients when applied to a considerable num- 
ber of variables. The relatively small percent- 
age of variance due to intelligence may be 
explained by two elements. 

1. The test used to measure intelligence 's 
designed to measure a cognitive G factor over 
and above any facility for reading, word skill. 
or any type of mathematics based on schoo! 
aptitude. This test aims rather at a measure 
of intelligence as intellectual insight and the 
power to do abstract thinking. 
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While the algebra verbal problem solving 
is composed of items representative of the 
erage high school algebra textbook and 
+ ndardized test, inspection points to the fact 
t they are not of such a level of difficulty 
to demand any considerable degree of 
ynitive power. 
fhe maximum contribution of any one of 


‘hese variables is about 9 percent. In regard 


the variance accounted for by these latter 

iriables it may be noted that: 

.) All but the last two (memory and 
arithmetic computation) are probably 
significant 

b) All are positive 

c) All are relatively distinct 


lt is significant, however, that entirely 
wyart from algebra computation and _arith- 


metic problem solving, the variables of vocab- 


lary, intelligence, reading, analysis, memory, 
ind arithmetic computation make small but 


definite contributions to variance in algebra 


erbal problem solving. 
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AN EXPERIMENTAL STUDY OF TWO TYPES 
OF ARITHMETIC PROBLEMS* 


Epwin W. BRAMHALL 


Principal, School No. 23 
Paterson, New Jersey 


The Problem.—The purpose of this study 
was to determine through experimentation 
the relative effectiveness of two types of arith- 
metic problems in the improvement of 
problem-solving ability of sixth-grade pupils. 


The two types of problems considered were 
those described by Wheat’ as the “conven- 
tional type” and the “imaginative type.” The 
“conventional” problem is the problem which 
is stated in the simplest, shortest, and most 
direct manner possible, with no attempt to 
give the pupil a picture of the larger, more 
complete situation than that directly con- 
cerned with the mathematical problem at 
hand. The following is an example of the 
conventional type problem: 


How much will six baseballs cost at $1.25 
each with a discount of 10 per cent? 


The imaginative problem is one which in- 
cludes elements of a larger situation than that 
expressed by the conventional type. It con- 
tains details not necessary to the actual solu- 
tion of the problem but which are intended to 
add interest to the problem to the end that 
the pupil may more readily understand the 
situation and, thereby, be more likely to 
achieve the correct solution. The following is 
an example of the imaginative type problem: 


Bill Jones is manager of the Tigers base- 
ball team which is tied for the champion- 
ship with the Maple Leaf team. The final 
game is to be played at the Tigers’ home 
field and the Tigers must supply the balls 
for the game. Mr. Williams, who owns a 
sporting-goods store, told Bill that he would 
give the Tigers a discount of 10 per cent on 
the balls for the game. How much, then, 
would Bill have to pay for six balls if the 
regular price is $1.25 each? 

* Abstract of thesis submitted in partial fulfillment s } 4 
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Sources of the Problem.—rThe sources 0! 
the problem considered in this study are: 

1. The recent tendency of the textbook 
writers to increase the length and scope of 
problem statements in order to increase 
pupils’ interest. 

2. The disagreement between the findings 
of a study by Wheat? and a study by Myers, 
both of whom attacked the problem. Myers 
and Wheat did not use problem material in 
a teaching situation over a period of time 
but, rather, used problems in a series of tests, 
from the results of which they reached oppo- 
site conclusions. Myers concluded from his 
evidence that imaginative problems were {ar 
superior to conventional problems, while 
Wheat concluded that any superiority which 
existed favored the conventional problems. 


Method of the Experiment.—The present 
study was a controlled group experiment con- 
ducted in the upper half of the sixth grace. 
The experimental group consisted of seven 
classes totaling two hundred thirteen pupils. 
and the control group consisted of seven 
classes totaling two hundred fourteen pupils. 
Both groups were given the following initial 
tests: Otis Group Intelligence; New Stone 
Reasoning Test in Arithmetic, Form 2: 


~ Woody—McCall Mixed Fundamentals, Form 


36 


1; and Gates Silent Reading, Form 1. 
Results of these tests indicated that there 
were no statistically significant differences 
between the two groups, and the groups were. 
therefore, reasonably closely equated. 
For the next ten weeks after the initial 
testing the following procedure was used: 
On three days of each week the arithmetic 
periods in all classes were given over to prob- 
lem work. Periods in all classes were of the 
same length of time (forty-five minutes). 
Mimeographed problem sheets prepared by 
the experimenter were provided all pupils a 
they were needed. The problems for both 


we op. cit. 
*G. Myers, 7 ed Err Arithmetic. Bo- 
ton: Pisces Press, a Dee va Ss - 
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groups were exactly the same except for the 
method of stating the problems. The control 
zroup worked conventional problems and the 
experimental group worked imaginative prob- 


lems 

Pupils worked at their own speed and with- 
ut the help of the teacher. When a pupil 
finished a problem or a sheet of problems he 
went on to the next problem or sheet. 

Teachers checked problem work in the 
classroom while the pupils were working by 
passing about the room and examining work 
as pupils finished a problem. If a_ pupil 
worked a problem correctly as to method but 
made an error in computation the teacher 
pointed out the error and had the pupil cor- 
rect it before he went on to the next problem. 
If the method of working was found by the 
teacher to be incorrect the pupil was told 
that he had not “thought out” the problem 
correctly and to do the problem again. In 
this case the teacher did not point out the 
error in reasoning. When a pupil failed to 
reason out a problem in three attempts he 
was told to leave the problem and go on to 
the next. There were no prescribed or re- 
quired methods of solution; any method 
which indicated correct reasoning of the prob- 
lem was accepted. All problem work was 
done during the classroom period. 

While the experiment was being conducted, 
teachers went on with their regular work in 
the fundamental processes and with new ma- 
terial on the days not given over to problem 
work. 

Problems were selected in accordance with 
the requirements of the regular arithmetic 
course of study and included work on all 
topics covered by the course. 

At the close of the experimental period all 
pupils were given the following final tests: 
New Stone Reasoning Test in Arithmetic, 
Form 1; Woody—McCall Mixed Fundamen- 
tals, Form 2; Gates Reading, Form 2; and 
* the “Problem Test,” consisting of ten pairs 
of representative problems of the two types 
used in the, experiment. 

Results of these tests were compared in 
order to determine whether or not any sig- 
nificant differences had been brought about 
during the experiment. ~ 
_In order to measure as accurately as pos- 
sible the results of the experiment, one hun- 
dred forty closely matched pairs of pupils 
were selected from the total groups. These 
pupils were paired according to intelligence 
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and reasoning ability, as shown by the initial 
tests. No pair was more widely separated 
than by three points on the intelligence scale 
or two points on the intelligence scale and 
two months on the reasoning scale. 


Results of the Experiment.—A summary of 
statistical data is given in Table I. Material 
in this summary shows that there were no 
statistically significant differences between 
experimental and control pupils on_ initial 
tests, or on final tests. In fact, the differences 
remain approximately the same for both in- 
itial and final tests. Mean gains in reasoning 
scores are practically the same for both ex- 
perimental and control groups, there being no 
significant difference between them. 

Data for paired groups show much the same 
results as those for total groups. 

There were no close correlations found be- 
tween gains and other factors measured in 
the testing. 

Both total groups and paired groups, ex- 
perimental and control, made significant gains 
during the course of the experiment. These 
gains amounted to more than eight months’ 
increase in grade scores in the experimental 
period of ten weeks during which time pupils 
worked without the help of teachers. 

All scores and gains are given in terms of 
grade scores. 


Conclusions.—There is no statistically sig- 
nificant difference between the effectiveness 
of the conventional and imaginative type 
problems in improving problem-solving ability 
when time is kept constant and pupils work 
at their own speed. 

Since experimental pupils achieved gains 
equal to those of control pupils, although they 
worked fewer problems, the imaginative prob- 
lems might be considered, problem for prob- 
lem, slightly more effective than conventional 
problems. Results of the final Problem Test 
(prepared by the experimenter) showed a 
slight advantage for experimental pupils 
(about five per cent). However, this advan- 
tage is not great enough to be considered a 
basis for the exclusive use of imaginative 
problems, since the advantage would be offset 
by the increased time necessary in the work- 
ing of imaginative problems. 

The relatively large gains in reasoning 
scores made by all groups appear to indicate 
that a good method of teaching problem 
solving is to give pupils many opportunities 
to solve problems in their own way and at 
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TABLE I 


SUMMARY OF STATISTICAL DATA 


Experimental 


Group 
Comparison of mean scores in 
reasoning for total groups+ 
LES 6.63 
Final test eee 7.48 
Comparison of mean gains of 
total groups . 90 
Comparison of mean scores in 
reasoning for sain tintiintted 
Initial test - a seal 6.62 
Final test ___- epee A ee 7.50 
Comparison of mean gains of 
paired groups ___- ‘ 86 
Coefficients of correlation be- 
tween gains in reasoning 
and scores in __._._._.__~_ Intelligence 
Total groups 
Experimental _______..... —.26 
Control __- niin’ cea < ae 
Paired groups 
Experimental ____________ —.01 
NN ak i a clea ee —.07 
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Control Chances of Ii 
Group Difference Difference 
6.51 12 7 to 1 
7.38 10 6 to 1 
91 01 8 to 1 
6.61 -O1 2 
7.41 .09 3.9 to 1 

.86 0 0 

Fundamentals Reading teasoning 
.06 01 —.41 
11 —.05 —.27 
okt .03 —.39 
11 —.09 —.18 


* The “Chances of Real Difference” are based on the normal curve, chances of 369 to | 


being necessary to ensure practical certaint 


+ The expression “total groups” is ee to distinguish between the total groups of 2! 
experimental pupils and 214 control pupils and the paired groups of 140 pupils each. 


their own speed without a great deal of 
teacher influence. This should be checked by 
experiment, although it is in agreement with 
Hanna’s conclusion that children achieve 
more in problem solving when left to their 
own devices.* 

Correlations between gains in problem solv- 
ing and reading, arithmetic fundamentals, 


*P. R. Hanna. Arithmetic Problem Solving. New York: 
Bureau of Publications, Teachers College, Columbia Univer- 
sity, 1929 


intelligence, and reasoning ability, for th: 
group tested, are all small and unreliah\ 
Except for fundamentals, all correlations are 
negative. The fact that the correlation for 
fundamental processes is positive, thoug! 
small, leads to the belief that, if these 
processes were taught with meaning and in 
their true relationships, an important contri- 
bution to the improvement of problem solving 
might be made. 
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THE RELATIVE VALUE OF SOUND MOTION PICTURES 
AND STUDY SHEETS IN SCIENCE TEACHING* 
Roy V. MANEVAL 


Science Instructor, Will Rogers High School 
Tulsa, Oklahoma 


The use by school administrators and 
teachers of educational sound motion pictures 
has increased rapidly during the last decade. 
Their value in the classroom is usually ac- 
cepted without question. Considerable _re- 
search has been done using sound films, espe- 
cially in the field of science. However, much 
more research is needed in order to justify 
their wide acceptance by educators. 


Several investigations have been made to 
determine the value of sound films when used 
in various ways. Arnspiger' found that groups 
aided in their instruction by sound films were 
superior to those taught without the use of 
such aids. The superiority ranged from 18 to 
34 per cent. Clark* made a study in which he 
found that the sound film is as effective as 
actual classroom demonstrations and more 
effective than a silent film in conveying exact 
information. He also found that educational 
films can be used effectively as a means of 
increasing and arousing students’ interests. 
In this respect sound films are apparently 
more valuable than silent films. Einbecker’s* 
experiment showed that verbal accompani- 
ments increase the comprehension over that 
secured from the film without caption or com- 
ment. He also found that silent motion pic- 
tures accompanied by the teacher’s comments 
are superior to both the talking picture and 
the silent picture with respect to the learning 
of new technical words or unfamiliar names. 
Hansen* found that verbal explanation ac- 
companying an educational talking picture 
can be presented as effectively by the class- 
room teacher as by the medium of the re- 
corded voice from the sound motion picture 


* Field Study No. 2, Colorado State College of Education, 
Greeley, Colorado, 1939. 
¥. ¢. Arnspiger, Measuring the Effectiveness of Sound 
Pictures As Teaching Aids. Contributions to Education, No. 
365. New York: Teachers College, Columbia University, 1933. 
__ ©. C. Clark, “The Talking Movie and Students’ Interests,” 
S dence Education, XVII (December, 1934), 312-320. 

W. F. Einbecker, “(Comparison of Verbal Accompaniments 
‘o Films,” Education, LIII (February, 1933), 343-347. 
a E. Hansen, “Verbal Accompaniment of Educational 
~The Recorded Voice vs. The Voice of the Classroom 
en ee of Experimental Education, V (September, 
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projector. Rulon® found that  film-supple- 
mented classroom procedure resulted in an 
increase of pupil learning in excess of 20 per 
cent, when measured in terms of permanent 
acquisitions. In terms of retained achieve- 
ment measured by giving the tests three 
months later, the film-supplemented procedure 
was 38.5 per cent more efficient than the un- 
supplemented method. The findings of West- 
fall’s® investigation showed that explanations 
prepared by the teacher from material fur- 
nished with the film, a lecture furnished with 
the film and read by the teacher, and the 
usual captions were about equal as aids to 
understanding the contents of the film; these 
three forms of verbal accompaniment were 
superior to long captions. 

A study was conducted by the writer’ at 
Horace Mann Junior High School, Tulsa, 
Oklahoma, during the school year of 1937- 
1938, in which he found that, for immediate 
recall, study sheets were superior to sound 
films as a method of direct teaching. In the 
teaching of four subjects to 140 pairs of 
eighth grade science pupils, the data show 
that for each subject the study sheets were 
superior to a statistically significant degree. 
The data secured from the tests given for 
delayed recall did not give statistically sig- 
nificant evidence to indicate that either 
method of teaching was better. 


EXPERIMENTAL PROCEDURE 


The study here reported, like the first one 
by the writer, was made to determine the 
value of two methods of direct instruction’ 
(1) by the use of educational sound motion 
pictures, and (2) by the use of printed study 
sheets. The data were collected during the 
school year of 1938-1939 in the Horace Mann 


5P. J. Rulon, The Sound Motion Picture in 
Teaching. Cambridge: Harvard University Press, 1933. 

*L. H. Westfall, A Study of Verbal Accompaniments to 
Educational Motion Pictures. Contributions to Education, No. 
617. New York: Teachers College, Columbia University, 1934. 

™R. V. Maneval, “The Relative Value of Sound Motion 
Pictures and Study Sheets in Science Teaching,”’ Science 
Education, XXIII (February, 1939), 83-86. 
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and Theodore Roosevelt Junior High Schools 
of Tulsa, Oklahoma. Eighth grade science 
pupils were paired according to mental age, 
science reading ability, and sex. There were 
280 pairs remaining at the end of the experi- 
ment. One group of the pupils was called the 
X-Group and the other the Y-Group. 


Eight sound motion pictures on science 
subjects produced by Erpi Classroom Films, 
Inc., were selected. Four teaching units were 
made of these by combining them as follows: 
Ground Water and Volcanoes in Action, The 
Work of Rivers and The Work of the Atmos- 
phere, Water Power and Conservation of 
Natural Resources, and The House Fly and 
The Geological Work of Ice. The study sheets 
were made to resemble parts of science texts 
and workbooks. The lectures which are a 
part of Erpi films were used as the bases of 
the text sections; they were changed only 
enough to make them understandable without 
the motion pictures. The workbook sections 
consisted of uncompleted statements relative 
to the main ideas of each text section. 


Multiple choice tests were constructed for 
each unit of instruction. Each test consisted 
of fifty-six items, twenty-eight for each of the 
two subjects of the unit. The reliability of 
each test was determined by the chance 
halves method, after which the Spearman-— 
Brown formula was applied. The reliabilities 
of the tests ranged from .839 to .892. 


On the first day of experimental teaching, 
the X-Group was taught by the use of the 
sound motion pictures and the Y-Group with 
study sheets. In teaching with the sound 
films, each film was viewed twice by the 
pupils. The study sheet group read each text 
section twice and then filled in the blanks of 
the workbook. The time for instruction was 
the same for both methods. No comments 
were made by the teachers concerning the 
content of the instructional matter. The day 
after teaching, tests based on the unit taught 
were administered to all pupils. Thirty days 
later they were tested for delayed recall. 
They were then retaught by the same meth- 
ods, using only one-half the time used for 
the original teaching. The next day all pupils 
were again tested. The groups were then ro- 
tated before each of the next three units of 
teaching. 
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STATISTICAL TREATMENT OF DaTA 


The means for the scores on each test were 
computed, when the tests were given for im- 
mediate recall, for delayed recall, and for 
immediate recall after reteaching. This was 
done separately for each unit of teaching, and 
for each of the two methods of teaching. 

The formula given by Peters and Van- 
Voorhis*® was used for determining the S.E.’s 
of the differences of the means, when the 
pupils are paired on a fallible criterion: 


Og 
VN 


The o, refers to the standard deviation of 
the differences between test scores of each 
pair. The r,. refers to the correlation between 
test scores by one method and composite 
scores, or the scores used for the purpose o/ 
matching; and r,. refers to the correlation 
between test scores of the second method and 
the composite scores. 

The experimental coefficients were deter- 
mined by dividing the differences of the 
means by 2.78 times the S.E. of the difference 
of the means. 

In Table I are shown the means of the test 
scores, the differences of the means, the S.E.’s 
of the differences of the means, and the ex- 
perimental coefficients, when the tests were 
used for immediate and delayed recall, and 
for immediate recall after reteaching. Five o/ 
the twelve experimental coefficients are 
greater than 1.00, and so are considered statis- 
tically significant. Of the four in favor of 
study sheets, three were on the first unit o/ 
teaching. The film method was superior to a 
statistically significant degree in only one 
test, that being for immediate recall after re- 
teaching for the fourth unit. 

A further study of the data was made by 
dividing the test scores of the pupils into three 
groups, according to composite pairing scores. 
The three groups used were: (1) above the 
seventy-fifth percentile, (2) between the 
seventy-fifth and twenty-fifth percentiles, and 
(3) below the twenty-fifth percentile. The 
same procedure as that given above was used 
in calculating the experimental coefficients 0! 
these groups. The differences of the means. 
their corresponding S.E.’s and the experi- 
mental coefficients are given in Table IT. For 
Group A, the immediate recall tests for each 


*C. C. Peters and W. R. VanVoorhis, Statistical Procedures 
and Their Mathematical Bases, hg State College, Penn- 
sylvania: Pennsylvania State Col 
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THE MEANS, THE DIFFERENCES OF THE MEANS, S. E.'S OF 
THE DIFFERENCES OF THE MEANS, AND THE EXPERIMENTAL 
COEFFICIENTS, WHEN THE TESTS WERE USED FOR IMMEDIATE 
AND DELAYED RECALL, AND FOR IMMEDIATE RECALL AFTER 
RETEACHING, IN A STUDY INVOLVING TWO GROUPS OF 280 








CASES EACH 
ME ANS DiF—- E xPER— 
SUBJECTS FER— 1MEN— 
ENCES S.E. TAL 
StTuoy SOUND OF COEF— 
SHEET FILM MEANS Fi- 
CIENTS 
IMMEDIATE RECALL: 
1.GROUNOD WATER AND 36.85 433.86 2.99 .373 2.88 
VOLCANOES IN ACTION 
2.WoRK OF RLVERS AND 34.74 434.01 -73 322 82 
WoRk OF ATMO@PHERE 
z.WATER POWER AND THE 40.65 439.6 .98 .338 1.04 
CONSERVATION OF 
NATURAL RESOURCES 
4.HOUSE FLY AND GEO- 438,42 38.48 -.06 .306 —.07 
LOGICAL WorK oF ICE 
DELAYED RECALL: 
1.GROUND WATER AND 33.95 31.97 1.98 .394 1.81 
VOLCANOES IN ACTION 
2,WORK OF RIVERS AND 31.59 431.56 -0% 326 03 
WoRK OF ATMOSPHERE 
3.WATER POWER AND THE 36.28 36.53 —.26 .356 —.25 
CONSERVATION OF 
NATURAL RESOURCES 
4.HOUSE FLY AND GEO— 45.32 35.90 —-48 .325 —.53 
LOGICAL WoRK OF ICE 
IMMEDIATE RECALL AFTER 
RETEACHING: 
1.GROUND WATER AND 38.41 37.10 1.31 .365 1.29 
VOLCANOES IN ACTION 
2.WORK OF RIVERS AND 36.63 37.14 —--51 .326 -—. 
WORK OF ATMOSPHERE 
3.WATER POWER AND THE 41.96 42.68 —-.92 .337 —.98 
CONSERVATION OF 
NATURAL RESOURCES 
4.HOUSE FLY AND GEO— 39.94 41.20 1.26 .299 1.52 


LOGICAL WORK OF ICE 
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OLFFERENCES EXPERIMENTAL 
OF MEANS aes COEFFICIENT 
SuBUECTS 
A B C A B c A B 
IMMED FATE RECALL: . 
1.GROUND WATER AND 5.09 2.88 1.16 1.02 -82 .921.979 1.26 .4: : 
VOLCANOES IN ACTION 
2.WorK OF RIVERS AND 2.65 .38 ~-.§52 .89 .70 .641.07 .19 - 
Work OF ATMOSPHERE , 
3. WATER POWER AND THE 1.97 -70 -55 72 .67 1.344 1.96 -38 ; 
CONSERVATION OF 
NATURAL RESOURCES 
4.HOUSE FLY AND GEO- 1.32 -02 ~—1.60 74 .§6 1.056 .66 .01 -.53 . 
LOGICAL WorK oF ICE 
DELAYED RECALL: 
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VOLCANOES IN ACTION 
2.WorK OF RIVERS AND 1.31 ~.34 ~-.§0 92 .73 -94 .§1 1 = 
WoRK OF ATMOSPHERE 
z.WATER POWER AND THE 1.23 —,45 1.35 -77 .73 1.12 .5% 99 - 
CONSERVATION OF 
NATURAL RESOURCES 
4.House FLY AND GEO- 1.27 ~-.12 -2.94 -8 .60 1.02 55 O07 ~1.04 
LOGICAL WoRK OF ICE 
IMMEDIATE RECALL AFTER 
RETEACHING: 
1.GROUND WATER ANDO 2.32 1.63 ~—-33 -95 -74 1.17 .88 es 
VOLCANOES IN ACTION i ¥ 
2.Work OF RIVERS AND 66 —.23 -2.8 06 76 1:38 +28 -.12 -.7 
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4.WATER POWER AND THE 1.86 -1.55 —2.43 65 <70 1.20 1.03 “«79 - 
CONSERVATION OF 
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LOGICAL WORK OF ICE 














nt 


Pere away ts 


cane 








Sy ptembe - 1939 | 


{ the first two units, the delayed recall tests 
for the first unit, and the immediate recall 
tests after reteaching for the third unit, gave 
statistically significant evidence in favor of 
the study sheet method of teaching. In Group 
K, only the immediate recall test of the first 
unit was statistically significant in favor of 
study sheets. In Group C, the delayed recall 
test for the fourth unit and the immediate 
recall test after reteaching for the same unit 
both gave statistically significant evidence in 
favor of the sound film method of teaching. 


CONCLUSIONS 


In this experiment the sound motion pic- 
ture and study sheet methods were compared 
as to their effectiveness for direct teaching of 
general science. Only those sound films were 
used which contained a minimum of time- 
lapse photography, micro-photography, and 
animation. 

When objective tests were administered to 
the pupils for immediate recall, delayed re- 
call, and immediate recall after reteaching, 
the data did not favor either the study sheet 
method or the sound film method of teaching 
to a degree which could be called statistically 
significant. 

It is interesting to note that both in the 
investigator’s previous study involving this 
problem and in the present experiment, the 
first of the four units taught resulted in data 
in favor of study sheets. Inasmuch as the 


pupiis in the experiment had not been previ- 
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ously instructed by sound films, they possibly 
considered the instruction another picture 
show, even though they had been informed 
that they would be tested on the information 
given by the films. In the first study and in 
the present investigation the data in favor of 
the sound film method tended to increase as 
the studies progressed. In this study each test 
for delayed recall gave more evidence in 
favor of sound films than did the correspond- 
ing test for immediate recall. Likewise, each 
test for immediate recall after reteaching 
gave more evidence in favor of sound films 
than did the corresponding test for delayed 
recall. 

From this experiment it can be seen that 
pupils of higher mental ability tend to be 
taught more effectively by the study sheet 
method; the average group seems to be taught 
equally well by either method; and pupils of 
lower mental ability tend to be taught more 
effectively by the sound film method. Read- 
ing difficulties of those pupils classified as of 
lower mental ability were possibly partially 
overcome by the use of sound films. 

Further research could be made with this 
same problem, using longer periods for de- 
layed recall. This experiment could be re- 
peated profitably with pupils of higher grade 
level. The effect of previous instruction by 
the use of sound films seemed to have an in- 
fluence on the results of this investigation: 
therefore, another study should be made using 
pupils who have had experience in the use of 
sound films. 














RADIO LISTENING ACTIVITIES OF CHILDREN 


Weston R. CLARK 
University of Maryland 


Since the first modern broadcast in 1920, 
radio has experienced a phenomenal growth 
of tremendous psychological and social im- 
portance. It is reliably estimated that there 
were 33,000,000 radios in use in the United 
States as of January 1, 1937; that 78,000,000 
persons are habitual listeners;' that there are 
one billion radio listening hours per week; 
that the American public is spending in one 
way or another about $700,000,000 a year 
for its radio entertainment.* Approximately 
sixty thousand schools (1935), about one- 
fourth of all the schools in the nation, have 
receiving sets reaching nearly six million chil- 
dren.* Radio reaches into a greater propor- 
tion of homes than the newspaper, the motion 
picture, the church, or the school. Its influ- 
ence on the child during his average of more 
than two hours daily listening is enhanced 
by its capacity for gripping the listener with 
its appeal. The innovation of the radio has 
wrought revolutionary changes comparable 
with those of other major methods in human 
communication, and its ultimate conse- 
quences are unknown. 


Research in the field of children’s radio 
listening has experienced its greatest activity 
during the last six years. No broadcasts for 
children other than bedtime stories were pre- 
sented until the spring of 1929, and there was 
no marked trend in that direction until the 
fall of 1931. Dissatisfaction and criticism of 
radio programs by parents and welfare groups 
stimulated investigation of children’s habits 
with regard to the radio, and in 1933 several 
studies appeared in various parts of the coun- 
try. These early studies were very informal 
and of value merely in indicating interests 
and trends. Considering the importance of 
the subject, relatively few investigations have 
been reported to date,‘ and among these it is 


'Waldo Abbot. Handbook of Broadcasting. New York: 
McGraw Hill Book Company, 1937. 

* Columbia Broadcasting System. Lost and Found. New 
York: Columbia Broadcasting System, 1935. 

‘Hadley Cantril and Gordon W. Allport, The Psychology 
of Radio, p. 251. New York: Harper and Brothers, 1935. 

*A comparison of the findings of this study with those of 
other sendin in the field has been made by the author and 
will appear under the title of “Radio Listening Habits of 
Children” in a future issue of the Journal of Social Psy- 
chology. 1940, Vol. TI, No. 2. 


the exception when procedures of investiga. 
tion provide more than an indication of what 
might be so. 


The present investigation,’ of which this 
report is a brief abstract, deals with the re. 
lation of age, sex, rural and urban life, intel. 
ligence scores, and school grades to the radio- 
listening habits, interests, and reactions oj 
white children in Washington, D. C., and 
Fairfax County, Virginia. The study als 
seeks to ascertain the reactions of parents to 
the radio-listening activities of their children 


Responses to a comprehensive questionnaire 
were obtained from 505 children, representa- 
tive of the white-public-school population o/ 
the ages nine to eighteen in Washington 
D. C., the rural children in Fairfax County 
Virginia, and the boys of the National Train- 
ing School for Boys. The responses were 
made in the usual classroom situation with 
enrollments ranging in size from fifteen t 
forty and under the direction of the investi. 
gator and the class teacher. Questions requir- 
ing recall of names of specific programs re- 
ferred only to “last week’s’’ listening. 

The questionnaires were all distributed 
during the first three school days of the week. 
April 12 to 16, 1937. Programs which were 
broadcast during the week preceding the 
gathering of the questionnaire data by the 
four Washington stations were analyzed with 
the aid of the program directors and clas- 
sified by them into twelve types. The twelve 
program types decided upon are the fol- 
lowing: 

. Classical and semi-classical music 

. Religion 

. Dance, popular and novelty type 
Comedy and variety 

. Detective, crime and mystery program: 
Drama: general historical, romantic 

. Travel and adventure 


. Children’s programs (not otherwise 
listed ) 


® The investigation was conducted under the supervision of 
Professor Mitchell Dreese while the author was in residenc 
at the George Washington University. 
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_ National, public and civic affairs casting times. The rural children ex- 


10. News pressed greater preference than the 
11. Sports urban children for the period 7 P.M. 
12. Adult programs (including educational, to 12 midnight, week ends (Saturday 
labor, agriculture ) and Sunday). “Problem” children had 
a greater preference than “non- 
(he reactions of the parents to the listen- problem” children for week ends 7 P.M. 
ing activities of their children were obtained to 12 midnight. 
by a parents’ questionnaire, which was taken 2. Girls expressed greater preference than 
to and returned from the parent by the child. boys for the period 12 noon to 7 P.M. 
The school grades and intelligence test scores Children nine to twelve had a stronger 
of the children were obtained from the rec- preference for 12 noon to 7 P.M. than 
ords in each school. Rank order correlations the other children. 


of comparable items reported by parents and 
by their children were obtained. These ranged (. Radio-Station Offerings 
from .70 to .g2. 

The primary purpose was to determine 
differences among the children in each group 
studied. Differences reported are determined 
by statistically significant critical ratios. It 
should be remembered that conclusions drawn 
pertain to a specific geographical area, and 
that they are based on what parents and 
children reported. 


1. Comedy and variety were allotted 
more than double the amount of time 
of any of the other eleven types of pro- 
grams. Classical and_ semiclassical 
music and dance, popular, and novelty 
types ranked second and third. Pro- 
gram types which received a compara- 
tively small proportion of time were 
travel and adventure, detective, crime 


|. Amount of Time Children Listen and mystery, and religious programs. 


1. Children with 1.Q.’s above 130 did the D. Habits and Preferences in Radio Listening 
least radio listening. 

2. The average weekly listening time for 
all of the children was fifteen hours 
and thirty-nine minutes; boys and girls 
listened approximately the same; rural 
children listened more than urban chil- 
dren (eighteen hours, thirty minutes, 
and twelve hours, forty-eight minutes) : 
“problem” children listened more than 
“non-problem” children (fifteen hours 
and twelve hours, forty-eight minutes). 

3. Children fifteen to eighteen years of 
age reported less listening than those 
twelve to fifteen and more than those 


1. Whether a program will have a child 
audience is largely a matter of the 
hour at which it is broadcast. The chil- 
dren reported having listened to all 
types of radio programs, and in almost 
perfect relationship to their avail- 
ability. 

2. Detective, crime, and mystery pro- 
grams were more interesting and 
comedy and variety programs were 
less interesting to children twelve to 
fifteen years of age than to older or 
younger children. 


nine to twelve. While the average 3- Girls fifteen to eighteen listened more 

weekly listening for the children of the to romantic and historical dramatiza- 

three age groups (children from the tions than did boys of the same age. 

nine regular public schools in Wash- Before this age, the sex differences were 

ington, D. C.) was twelve hours and negligible. Boys fifteen to eighteen lis- 

forty-eight minutes, those twelve to tened more than girls to dance, popular, 

fifteen listened most, seventeen hours and novelty programs. While classical 

and thirty-six minutes, and those nine and semiclassical music was reported 

to twelve listened least, nine hours and infrequently, girls came to appreciate, 

twenty-four minutes. or at least listened to, this type of pro- 

; gram earlier than boys. 

B. Days and Hours Children Prefer to Listen 4. Of the things about programs which 
'. Evening hours during weekdays (Mon- made them favorites with children, ex- 

; day to Friday) were the most fre- citement and humor content were more 


quently mentioned choices of broad- important to children younger than fif- 
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. Humor content 
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teen, and music content to those older 
than fifteen. 

was a much stronger in- 
terest factor with girls than it was with 
boys nine to twelve and fifteen to 
eighteen years of age. 

Humor was much more important to 
boys nine to twelve and fifteen to 
eighteen than to girls of the same ages. 
Music had a comparable value in mak- 
ing programs interesting to boys and 
to girls nine to twelve and fifteen to 
eighteen. Wide differences occurred 
with boys and girls twelve to fifteen 
in that the girls placed a much higher 
value on the interest factor of music. 
increased in value in 
making programs interesting to chil- 
dren with increase in intelligence for 
children above 70 I.Q. Excitement and 
adventure counted more strongly for 
the children of ‘normal’ intelligence 
(1.Q. go to 10g) than for those with 
intelligence scores above 130. 

Music contributed more to making pro- 
grams interesting for “problem” chil- 
dren, and humor contributed more to 
the interest of “non-problem” children. 
Humor content was a more important 
interest factor in programs for urban 
than for rural children. Educational 
content was more important to rural 
children. 

Dull children (1.Q. 70 to 89) reported 
greater dislike for comedy and variety 
programs than the “normal” children 
(1.Q. go to 109). 


. “Problem” children listened more to 


” 


and liked better than “non-problem 
children detective, crime, and mystery 
programs and listened more to sports 
programs. ‘“Non-problem”’ children lis- 
tened more to dance, popular, and nov- 
elty programs, general historical and 
romantic drama, and listened more and 
liked better children’s programs. 

Dull children (1.Q. 70 to 89) listened 
more than the “normal” or bright chil- 
dren to dance, popular, and novelty 
programs. 


. On the whole, greater differences were 


found between the reports of children 
nine to twelve and twelve to fifteen 
years of age than between children 
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. Bright children (110 to 
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twelve to fifteen and fifteen to eighteen 
years of age concerning their radiy 
habits, preferences, and dislikes. 


. With advance in age, there were re 


ported less listening to and greater dis. 
like for children’s programs; an_ in- 
crease in interest for dance, popular 
and novelty programs; a stronger <is- 
like for general, historical, and romant) 
drama; a decrease in the dislike for 
classical and semiclassical music and 
religious programs. 

Children with an average school mark 
of A listened less than those with 
mark of C to historical and romantic 
drama programs and more to comed) 
and variety programs. There was a 
strong positive tendency for the mark 
A group to be set apart from all four 
of the other school mark groups. 


. Influence of Programs on Children 
Behavior 
1. Sleeplessness attributed by the children 


to radio programs was more common 
with children younger than twelve 
than with older children. 


. Children younger than twelve dreamed 


at night more than older children about 
the things they heard on the radio. 


. Girls nine to twelve dreamed more 


than the boys about things heard over 
the radio. At ages twelve to fifteen no 
differences appeared, but at ages fil- 
teen to eighteen, the girls dreamed far 
less than the boys. 

129 1.0.) 
dreamed more about radio programs 
than the very bright (1.Q. over 130). 
or the normal-intelligence children 
(1.Q. go to 109). 


. Rural children were more inclined to 


disturbance through dreams and sleep- 
lessness by things heard over the radio 
than were urban children. The types 0! 
programs which caused sleeplessness in 
rural children to a greater extent than 
in urban children were children’s pro- 
grams and general, historical, anc 
romantic drama. 


. Detective, crime, and mystery pro 


grams caused sleeplessness with 3 
greater percentage of urban children 
and general, historical, and romant< 
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rama programs caused sleeplessness 
with a greater percentage of rural 
hildren. 

The sleeplessness of all the children 
was affected most by detective, crime, 
ind mystery programs. However, chil- 
dren of twelve to fifteen and fifteen to 
eighteen were influenced more than 
those nine to twelve by romantic and 
historical dramatizations: children nine 
to twelve and fifteen to eighteen were 
influenced more than those twelve to 
fifteen by dance and novelty programs. 
Detective, crime, and mystery  pro- 
grams effected sleeplessness in a larger 
percentage of boys than girls. 
Children with intelligence scores 90 to 
129 were kept awake at night by radio 
programs more than those with intel- 
igence scores below 90 and above 120. 
Sleeplessness was more common to B 
than to C school-mark children because 
if memories of detective, crime, and 
mystery programs and dance, popular, 
ind novelty programs. 

Dull children were helped less than 
children with normal intelligence by 
radio programs. 

Children younger than twelve years 
had been helped in various ways by 
radio programs more than those older 
than twelve. Children’s programs were 
most helpful to children nine to twelve; 
general, historical, and romantic-drama 
programs to those twelve to fifteen: 
and news programs to children fifteen 
to eighteen. 

Girls had been helped more than boys 
by children’s programs and by classical 
and semiclassical music. Boys had been 
helped more by sports programs and by 
detective, crime, and mystery programs. 
Of the types of programs rural and 
urban children stated were helpful to 
them, news and adult programs were 
more helpful to rural children and chil- 
dren’s programs were more helpful to 
urban children. 

“Non-problem” children were helped 
more than “problem” children by radio 
programs. However, “problem” chil- 
dren were helped more by adult pro- 
crams, 

Boys more than girls were influenced 
by radio programs to do things which 


RADIO LISTENING ACTIVITIES 


F. Parental Direction of 
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they thought should have 


done. 

There were more age differences in the 
interests of children in radio programs 
and in the effect programs had on their 
behavior than there were in the listen- 
ing habits of children. 


they not 


Children’s Radto 


Listening, as Reported by the Children 
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Detective, crime, and mystery pro- 
grams were the programs most fre- 
quently objected to by parents, accord- 
ing to the statements of children. 

Girls were encouraged more than boys 
during the ages nine to twelve to listen 
to parent-preferred programs. Children 
younger than fifteen were more obe- 
dient to these requests of their parents 
than were older children. 

Boys were less obedient to the radio- 
listening requests of their parents than 
the girls, especially between the ages 
fifteen to eighteen. Boys were more 
progressively disobedient with increas- 
ing age. 

Rural children were more obedient 
than urban children to the radio- 
listening requests of their parents. 
Children with intelligence scores above 
130 were less obedient to the radio- 
listening requests of parents than those 
with intelligence scores 70 to 129. 

BK pupils were more obedient to par- 
ent’s listening requests than C pupils. 


. “Problem” children were less obedient 


than “non-problem” children to the 
radio-listening requests of parefts. 
Urban children had beerf*told more by 
parents not to listen to programs of 
the detective, crime, and mystery type, 
while rural children had been told more 
not to listen to children’s programs. 
Urban children reported more often 
than rural children that their parents 
had encouraged listening to classical 
and semiclassical music; rural children 
ranked higher in their reports of par- 
ents’ approval of comedy and variety 
and children’s and adult programs. 
Parents were reported-to give children 
with intelligence scores above 130 more 
encouragement than those with scores 
90 to 129 to listen to adult programs. 
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Children with normal intelligence were 
encouraged more than bright children 
to listen to general, historical, and 
romantic programs. 

‘‘Non-problem”’ children were urged by 
their parents more than “problem” 
children to listen to classical and semi- 
classical music. 


Parents’ Attitudes Toward Their Chil- 
dren's Listening, as Reported by the 
Parents 


wm 


The use of slang, the features of ex- 
citement and excessive emotionality, 
made programs most objectionable to 
parents. 


. There was a close similarity between 


the reports of children and those of 
parents regarding the amount of time 
children spent listening to the radio. 


. The types of programs most frequently 


approved by both rural and urban par- 
ents were general, historical, and 
romantic drama; comedy and variety; 
dance, popular, and novelty; and clas- 
sical and semiclassical music programs. 


. The relationship between children’s 


reports and parents’ reports of pro- 
grams parents encourage their children 
to listen to was shown by a correlation 
of .70. 


. The chief benefits children derived 


from the radio were considered to be 
aids to school work, by rural parents, 
and development of the finer things of 
life, by urban parents. 

Programs considered most undesirable 
by parents were those concerned with 
gangsterism, crime and mystery stories, 
followed by sentimental love stories. 
The correlation of coefficient between 
the children’s and parents’ reports in 
this regard was .92. 


7. Ot the children’s activities most inter- 
ferred with by radio listening, urlap 
parents considered home work first and 
rural parents considered general work 
duties first. 


H. Choice Between Radio-Listening an¢ 


Other Activities 

1. In response to the question “Do you 
as a rule do anything else while you 
listen to radio programs?” sixty-six and 
a third per cent of the children re- 
ported that they did other things whilk 
listening. 

2. Children younger than twelve and older 
than fifteen divided their attention 
more than children twelve to fifteen. 

3. These apparent age differences were 
accountable to sex differences, for the 
proportion of the boys’ responses was 
almost identical for all ages. 

4. The preference of going to the movies 
to radio listening and of radio listening 
to reading the “funnies” was greater 
for children of fifteen to eighteen than 
for those nine to fifteen. Although » 
age differences in the extent to which 
other activities were engaged in dur- 
ing radio listening obtained for the 
boys, girls fifteen to eighteen engaged 
in other activities more than girls nine 
to fifteen. 

5. D pupils had a greater preference than 
C pupils for going to the movies over 
radio listening and a greater preference 
than B pupils for radio listening over 
reading an adventure story. 


This inquiry into children’s radio listening 
activities is being extended by using other 
techniques, such as covert observations 0! 
children listening to the radio in their usua! 
home situation. This will serve as another 
check on some of the findings reported here. 
and obtain additional information essentia 
to an understanding of the problem. 














nnn teil 


oo atl ed 


wc inl a 


ante Tell Eat me deer = 5 








THE RELATION OF TEACHERS’ MARKS 
TO STANDARDIZED TESTS* 


CLARENCE CARL MOORE 


Superintendent of Schools 
Gill, Colorado 


Introduction.—Since the early history of 
formal education there has existed some form 
of pupil measurement. In recent years the 
obiectives of education have been broadened 
so that such outcomes as attitudes, habits, 
and appreciations are considered as impor- 
tant for the proper kind of growth in social 
experiences as is subject matter or factual 
knowledge. With the emphasis being placed 
on these new objectives of education, it is 
important to know how closely teachers’ 
marks are related to standardized tests of 
achievement and native ability. 

Procedure —The field of inquiry for this 
study has been limited to the data gathered 
in the Glenrock—Parkerton School System, 
Glenrock, Wyoming; The Las Animas County 
High School System, Branson, Colorado; and 
the Grover—Hereford School System, Grover, 
Colorado. The period of testing extended 
over a period of time beginning with the 
school year of 1932-33 and ending with the 
school year of 1936-37. Each of the school 
systems included in this study was under the 
direct supervision of the writer. The proce- 
dure was uniform throughout the entire study. 

There were twenty-three teachers whose 
marks were included in the study. Of these, 
fourteen were high school teachers and nine 
were elementary school teachers. Only two of 
the entire group of twenty teachers were 
inexperienced. 

There was a group of pupils at each of two 
levels of school work. The first group con- 
sisted of tenth and eleventh year pupils of 
the senior high schools and the second group 
was made up of fifth and sixth year pupils 
of the elementary schools. 

The tests selected and administered to the 
high school pupils consisted of the Sones— 
Harry High School Achievement Test and the 
Otis Group Intelligence Scale. The tests se- 
lected and administered to the elementary 
school pupils were the New Stanford Achieve- 


* Brief summary of: C. C. Moore, The Relation of Teach- 
ers’ Marks to Standardized Tests. Unpublished Doctor’s Field 
Study, Number 1, Colorado State College of Education, Gree- 
ley, Colorado, 1938, 
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ment Test and the Otis Group Intelligence 
Scale. The authors and publishers of these 
tests claim a high reliability for them. 

The schools used in the study were what 
the progressive educationalists would term 
conservative. However, they were progressive 
to the extent that teachers were allowed a 
large degree of freedom in choosing the 
methods used. The state course of study pro- 
vided the only outline of a curriculum off- 
cially imposed upon the teachers. 

Teachers knew early in the year that 
standardized tests would be administered. 
However, none of the teachers had access to 
any standardized tests or their results before 
they were administered for the purpose of 
making this study. 

All of the teachers of high school subjects 
had standard four year degrees from accred- 
ited colleges. Three of them had earned the 
Masier’s degree. Six of them had taken some 
graduate work in education. All of them had 
at least twenty-two and one-half quarter 
hours in education and at least the number 
of hours in the subject taught equivalent to 
that termed a “minor” by many colleges. 


All of the elementary school teachers had 
at least two years of college work and three 
of them had the standard four-year Bach- 
elor’s degree. All had majored in education. 
All those who had less than a four-year de- 
gree had at least twenty-two and one-half 
quarter hours in education. 

The Findings—The average number of 
cases for the five groups of correlations be- 
tween teachers’ marks and achievement, as 
measured by the Sones-Harry High School 
Achievement Test, is 128. The coefficients 
of correlation between teachers’ marks and 
achievement are as follows: language and 
literature .51; mathematics .36; natural sci- 
ence .18:; social science .30; and the average 
of the above four subjects .54. 

The average number of cases for the five 
groups of correlations between teachers’ 
marks and intelligence, as measured by the 
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Otis Group Intelligence Scale, is 132. The co- 
efficients of correlation between teachers’ 
marks and intelligence, as measured by the 
Otis Group Intelligence Scale, are as follows: 
language and literature .43; mathematics 
35; natural science .22; social science .39; 
and the average of the above four subjects 
42. 

The average number of cases for the five 
groups of correlations between achievement, 
as measured by the Sones-Harry High School 
Achievement Test, and intelligence, as meas- 
ured by the Otis Group Intelligence Scale, is 
138. The coefficients of correlation between 
achievement and intelligence are as follows: 
language and literature .54; mathematics .48: 
natural science .19; social science .36; and 
the average for the four subjects .49. 

The average number of cases for the five 
groups of correlations between elementary 
school marks and the achievement, as meas- 
ured by the New Stanford Achievement Test, 
is 188. The coefficients of correlation between 
elementary school marks and this standard- 
ized achievement test are as follows: lan- 
guage and literature .44; mathematics .60; 
reading .56; social science .39; and the aver- 
age for the ten divisions of subject matter, 
as measured by the test, correlated with the 
average of the marks in the four divisions, 
is .61. 

The average number of cases for the five 
groups of correlations between elementary 
school marks and intelligence, as measured by 
the Otis Group Intelligence Scale, is 172. The 
coefficients of correlation between marks and 
intelligence are as follows: language and lit- 
erature .67; mathematics .59; reading .47: 
social science .49; and the average of school 
marks and intelligence .69. 

The average number of cases for the five 
groups of correlations between elementary 
school achievement, as measured by the New 
Stanford Achievement Test, and intelligence, 
as measured by the Otis Group Intelligence 
Scale, is 172. The coefficients of correlation 
between elementary school achievement and 
intelligence are as follows: language and lit- 
erature .65; mathematics .68; reading .74; 
social science .54; and the average of achieve- 
ment, as measured by the ten divisions of the 
achievement test, and intelligence, as meas- 
ured by the standardized test, is .69. 

When the effect on the agreement created 
by mere chance is eliminated, the per cent of 
agreement inclusive of that due to a best 
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guess is as follows: between 31 and 35 per 
cent there are 9 correlations, between 36 and 
40 per cent there are 8 correlations, between 
41 and 45 per cent there are 7 correlations, 
between 46 and 50 per cent there are § cor- 
relations, and between 51 and 55 per cent 
there is 1 correlation. 

Conclusions —T he correlations in this 
study range from very low to medium high 
They agree rather definitely with the various 
correlations obtained from previous studies 
in the field. 

There are many factors which have been 
responsible for the low coefficients of correla- 
tions found in this study. One of the most 
important of these is doubtless the fact that 
all factors which enter into the final marks 
awarded by teachers are not included in th 
items which enter into the construction of 
standardized tests. 

Another important factor may be the in 
adequate amount of preparation which teach- 
ers have for carrying on their work with « 
testing and marking program. 

The data of this study point to the con 
clusion that standardized tests do not measur: 
the same objectives of pupil achievement as 
receive emphasis when teachers award marks 
based upon their judgment of the pupil’ 
growth, as modified by the results of class- 
room tests. 

There is approximately the same degree o/ 
relation between intelligence and _ teachers 
marks as there is between achievement and 
teachers’ marks, as far as these measures have 
been determined within the scope of this 
study. 

If one of the objectives of the educational! 
program is to provide for and to promote the 
individual interests of the pupil within the 
group, objective tests devised for the purpose 
of measuring the group will not yield accu- 
rate results concerning the total achievemen' 
of the individual pupil. Classroom tests de- 
signed for the measurement of group interests 
will probably penalize the pupil whose inter- 
ests and achievement fall outside the average 
interest and achievement of the group. How- 
ever, if teaching objectives are to become 
standardized to the extent that there are ready 
made responses for the group or if there are 
previously determined patterns to which al! 
the group should conform, then standardized 
group tests will measure these responses more 
adequately than teachers’ tests or teachers 
marks. 
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EDUARD BURGER AND JOHN DEWEY: 


A Comparative Study of Burger’s Arbestsschule and Contemporary 
American Activity Schools as Representative of Dewey’s 
Educational Philosophy* 


Gustav G. SCHOENCHEN 
Principal, Public School No. 3 
The Bronx, New York City 





The Problem.—Eduard Burger, a promi- 
nent Austrian educator, having made a care- 
ful study of the history, development, and 
critique of the activity principle in education, 
was called upon by the post-war government 
to reorganize its educational system along 
ictivity lines. His labors, while they made 
Vienna the Mecca of activity pedagogy for 
ore-Hitler Germany and Austria, have never 
been given due consideration in this country. 
The present investigation is an attempt to 
show the importance of Burger’s work and 
briefly to compare Burger’s pedagogical teach- 
ings with those of John Dewey. 


The Purpose of the Investigation —1. To 
present and evaluate critically Burger’s edu- 
cational teachings. 

2. To show the origin, evolution, and influ- 
ence of his ideas. 

3. To summarize his distinctive contribu- 
tion to activity pedagogy. 

4. To compare his ideas of activity peda- 
zogy with those of Dewey. 


Significance of the Study.—Since the World 
\Var people on both sides of the Atlantic have 
become dissatisfied with the school. The at- 
tempt to make the school come into closer 
contact with life and to improve its product 
has resulted in a new pedagogical set-up called 
\rbeitsschule in Germany and Austria, and 
Activity School in America. The significance 
of this study rests, therefore, upon two facts: 
irst, that earnest and experienced pedagogues 
in widely separated places have suggested 
similar solutions to meet the educational prob- 
lem; secondly, that in spite of similarity in 
suggested solutions there is still no agreement 
among educators as to what constitutes an 
activity program. By examining and evaluat- 


__’ Abstract of a thesis presented to the Graduate Faculty of 
te School of Education of New York University in partial 
‘wifliment of the requirements for the degree of Doctor of 
Philosophy, June, 1939. 
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ing a foreign program and comparing its 
methods with those current in this country we 
may be enabled to see our own activity pro- 
gram in clearer light and perhaps gain from 
European experiences. 

Life of Burger—Burger was born March 
5, 1872, in Niederlichtenwalde, a tiny village 
in German Bohemia. Following his father’s 
footsteps, he prepared himself for teaching, 
first at the provincial Teachers’ Academy at 
Leitmeritz, later at several famous univer- 
sities. From Prague he received, 1913, the 
Ph.D. degree, writing as his dissertation on 
“Activity, a Psychological and Pedagogical 
Principle”. 

From 1890 to 1900 he taught in various 
Austrian cities at different levels, and con- 
tributed prolifically to various German and 
Austrian pedagogical journals. These writings 
and his activities at Teacher Conferences 
brought him considerable prominence, and he 
was invited to become professor of pedagogy 
at the Federal Pedagogical Institute at Inns- 
bruck where he served from 1900 to 1920. In 
1914 appeared his capital work, Arbeitspaeda- 
gogik: Geschichte, Kritik, Wegweisung. Its 
importance was immediately recognized but, 
owing to the war, no application of its recom- 
mendations was made. The book was re- 
issued in much enlarged form in 1923. 

In 1916 he became editor of the foremost 
Austrian educational journal which he re- 
named Monatshejte fuer paedagogische Re- 
form; in 1920 he was drafted into the Min- 
istry for Education to put his ideas into 
practice in a thoroughgoing reorganization of 
the Austrian educational system. As Federal 
Inspector, he instituted activity education as 
the form of general education in all the 
schools of Vienna, organizing and training 
active and prospective teachers at the Peda- 
gogical Institute of Vienna, and keeping up 
his pedagogical writings in Die Quelle, as he 
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had renamed the Monatshefte at the time, 
1920, when he had dedicated the publication 
to the service of the Austrian reform move- 
ment. His services lasted until 1934 when he 
retired because he found himself out of sym- 
pathy with the new authoritarian regime of 
Dollfus. He died four years later, December 
2, 1938, in Vienna. 

Burger as Historian of Activity Pedagogy. 

-Burger shows that the structure of our 
educational system is a combination of 
Pestalozzianism and Herbartianism. The ac- 
tivity school is not, therefore, the invention 
of a single man, but is the result of a slow 
growth toward which many educators have 
contributed. Chief among these are Comenius, 
Rousseau, Kindermann, Fichte, Pestalozzi, 
and Froebel. 

The activity school must be studied as a 
changing dynamic organism, differing at suc- 
cessive stages of development. Since the be- 
ginning of the twentieth century it has been 
affected by modern educational eclecticism in 
several ways: 

1. Through broadening the 
manual training. 

2. Toward sharper definition and elabora- 
tion of its psychological bases. 

3. Toward greater emphasis upon the com- 
munal or social nature of pedagogical activity. 

4. Toward clearer recognition of the im- 
portance of individualization of instruction. 

5. Toward greater emphasis upon nature 
and science as curricular material. 

6. Toward more efficient organization and 
administration. 

Burger as Critic of Activity Pedagogy.—A 
theoretical critique of any system of educa- 
tion would start by examining the nature of 
man and deriving the system from this anal- 
ysis. Man as an individual is both a physical 
and a mental entity; man, generically, is both 
an individual and a member of an organism. 
The three-fold nature of man gives rise to 
three directional trends in education. These 
are the hygienic trend, stressing the physical 
aspect; the didactic trend, stressing the indi- 
vidual mentality; and the hodegetic trend, 
which stresses the implications of man’s social 
nature. Burger critically examines activity 
pedagogy and finds it superior to the tradi- 
tional school in the light of each of these three 
directional trends in education. 

From the practical side these trends are 
found to express themselves in characteristic 


concept of 
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educational forms—the naturalistic, the indi- 
vidualistic, and the social respectively. Exam- 
ining activity pedagogy in the light of these 
characteristic forms of education, Burger con- 
cludes: that to naturalism activity pedagogy 
is indebted for methodology and curriculum 
to individualism for its realization of the im- 
portance of individualization in instruction 
to social education for the knowledge of the 
importance of social living within the schoo! 
in a community-of-work type of interna’ 
organization. 

Burger as Guide; Underlying Consider 
tions —1. The aim of the activity schoo! 
should be the perfectibilian goal of the idea! 
man, for only in this unattainable goal do w: 
find a constant aim, and one that is broad 
enough to include all other legitimate su). 
aims of education. 

2. The locale of pedagogical activity is thy 
school. 

3. Pedagogical activity takes two forms 
it is a subject of the curriculum, usually calle: 
manual training; it is also a method of teach. 
ing any subject. 

4. Activity pedagogy 
skill of a very high order. 

5. Pedagogical activity includes every )- 
plication of mental or physical energy to over 
come resistance whereby cultural values ar 
created. 

6. There is no irrepressible conflict | 
tween the traditional school and the activit 
school. On the contrary, since activity peda- 
gogy is the only educational form which can 
successfully combine all the cultural curren’ 
of our times, it is destined to become ty 
modern form of general education. 


Burger as Guide: Method of Activil 
Pedagogy.—Activity method is heuristic 
that is, aiding or guiding the child in discov- 
ery, seeking and finding; inciting to observa- 
tion or invention. Heuristics is the exac' 
antithesis and complement of technical in- 
struction. 

There being three mental categories ‘0 
volved in learning—ideation, judgment 0 
reasoning, and the emotional-volition«! 
heuristics takes three forms: 

Empirical heuristics stimulates pupil acti | 
ity in the direction of concept and percep! | 
formation. It takes such characteristic forms 
as the collection, excursion, personal observ: 
tion, experimentation, manual and sense «is 


requires teaching 

















ee eae 


se OATS Oo 


hide cit 








, 1939] 


wher 


‘he map, the lecture. In all of these, even the 
ecture, pupil activity is stressed. 

Logical heuristics stimulates pupil activity 
» the formation of valid judgments and con- 
-lusions. It functions through choice, arrange- 

ent. and methodology of curricular mate- 
rial. It requires: cutting down overcrowded 
curricula: providing for unspecialized, inte- 
srated instruction in primary grades with 
progressive differentiation into traditional cur- 
ricular subjects as the pupil advances; ar- 
rangement of curricular material according to 
the genetic principle; judicious use, but not 
chief reliance on, opportunistic teaching; use 
of the inductive method; working out a spe- 
cia! rational method for each curricular sub- 
iect: adapting the form of drill to individual 
pupil characteristics; curtailing the use of the 
catechetical form of recitation in favor of the 
dialogue type; and setting up rather large 
units as “activities” or “projects” properly 
subdivided into partial problems at which the 
pupil works self-actively toward a self deter- 
mined goal. 

rechnical heuristics stimulates pupil activ- 
ity in the direction of self expression. Its most 
characteristic form is oral or written language, 
but other forms—the dance, the model, vari- 
ous types of manual training, art, music, and 
dramatics—are possible. When a curricular 
subject serves also as a means of self expres- 
sion—e.g., drawing as a means of self expres- 
sion in geography—it becomes method as well 
as remaining a curricular subject. 


Contrast between Dewey and Burger Sum- 
marized.—Contrasts are as follows: 


i. As to the Nature of Education 

Dewey defines education as a social neces- 
sity taking the form of directed activity to 
insure proper individual growth for social con- 
tinuity. Burger defines activity education as 
that form of education which accepts the 
activity principle as basic for establishing 
aim, method, and content. 

2. As to Underlying Philosophy 

Dewey regards philosophy as a method for 
determining appropriate action; philosophy is 
the theory of education as a deliberately con- 
ducted practice. This involves concepts of the 
nature of knowledge and the basis of moral 
conduct concerning which Dewey’s position is 
that of the thoroughgoing pragmatist. Burger 
uses philosophy as the basis and justification 
‘or his educational system; he accepts the 
traditional view of the nature of knowledge, 
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and, unlike Dewey, applies to ethics and re- 
ligion for sanctions on which to erect his 
theories for character training and morality. 

3. As to Aim 

Dewey accepts social efficiency as the aim 
of education, placing upon education the re- 
sponsibility for the creation of the ideal soci- 
ety, democratic in form, but with progres- 
sively nobler social institution. For Burger 
the aim of education is the creation of the 
ideal man; while he does not place upon edu- 
cation the creation of a nobler social order, 
he thinks of a better society as a by-product 
of a better man. 


4. As to Method 

While there is nothing in Dewey to com- 
pare with the detailed description of activity 
methodology as such given by Burger, the two 
men are in essential agreement in the way 
they view the problem of method. Method 
and subject matter are inextricably related: 
method may be either general or individual; 
general methods consist of practices found 
useful in the past, but care should be taken 
not to let methods dominate the educational 
process. 

Individual method is differently viewed by 
the two educators, Dewey characterizing it 
by its results, Burger by its processes. Method 
is closely related to discipline, but, while for 
Dewey discipline and interest are one, for 
Burger discipline is part also of the larger 
field of moral training. Method boils down 
to teaching the pupil to think—a process 
which Burger discusses in greater detail than 
Dewey but in essential agreement with him. 


5. As to Subject Matter 

Subject matter is everything connected 
with educational experience; it is of three 
kinds—skills, knowledge of things, ideas— 
but while, for Dewey, these kinds are also 
levels of learning, Burger thinks of them as 
united on all levels. Subject matter should be 
presented in large units of activity; the social 
sciences should be stressed. Subject matter 
should be presented psychologically rather 
than logically. 

6. As to Outcomes 

Dewey and Burger are in close agreement. 
Outcomes may be either intrinsic or instru- 
mental values. They fit men for labor and 
leisure, for doing and knowing, for vocation 
and culture. They are humanistic and scien- 
tific. They result in individual freedom, but 
Burger and Dewey differ as to the meaning 
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of individual freedom. Finally, according to 
Dewey these outcomes result in improve- 
ments in society, but according to Burger 
they result in improvements in the individual. 

Conclusions—1. That Burger’s work as 
historian, critic, and guide in activity peda- 
gogy has, through its definiteness and clarity, 
advanced the cause of modern education. 

2. That the determination of an appropri- 
ate aim is a necessary step for any educational 
system to achieve effectiveness. 

3. That true individualization consists not 
in letting the child decide for himself what 
he wishes to learn, but in adapting the learn- 
ing process to the learner's individual char- 
acteristics. 

4. That while frequent revision of the cur- 
riculum is desirable, the curriculum. itself 
must not be abandoned. 

5. That socializing method and activities, 
while admittedly valuable, cannot become our 
sole reliance in moral training. 

6. That opportunistic teaching must be the 
exception rather than the rule in an effective 
educational program. 

7. That in his description of the three 
forms of heuristic methodology, Burger has 
given us the clearest, most detailed, and 
soundest application of the principle of pupil 
self activity. 
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THE MEASUREMENT OF ABILITY IN CAPITALIZATION 
AND PUNCTUATION* 


GERALD V. LANNHOLM 
University of Cincinnati 


THE PROBLEM 


1. INTRODUCTION 


A relatively large number of different 
kinds of objective techniques have been used 
for the measurement of ability in capitaliza- 
tion and punctuation. As yet, little is known 
of the relative effectiveness of these tech- 
niques. Particularly little evidence is avail- 
able concerning the relative effectiveness of 
the techniques recently introduced especially 
to secure more economical scoring of tests. 


The major purpose of this study was to 
make an objective evaluation of certain self- 
administering types of capitalization and 
punctuation tests. The types of tests selected 
for evaluation are described in the following 
sections. 


2. DESCRIPTION OF THE EXPERI- 
MENTAL TEsTs' 


The Capitalization Tests 


Six different techniques for the measure- 
ment of ability in capitalization were selected 
for evaluation. The criteria of selection were: 
(1) the tests must be purely objective: 
(2) the tests must be self-administering; and 
(3) the tests must either be similar to ones 
used in standardized tests, or give promise of 
value because of possibility of adaptation to 
rapid-scoring techniques. 


The tests selected on the basis of these cri- 
teria will now be described in terms of a 
reproduction of the directions to the pupil, 
together with a sample exercise for each. 
Following the presentation of each sample 
exercise, brief comments will be made con- 
cerning each test. 


_ \ The main content of a dissertation submitted in partial 
‘ulfillment of the requirements for the degree of Doctor of 
hilosophy in the College of Education, in the Graduate 
( liege of the State University of Iowa, June, 1939. The 
author is deepy indebted to Dr. E. F. Lindquist, Professor 
{ Education in the State University of Iowa, for stimulat- 
ing counsel during the progress of this investigation, and for 
he suggestion of the problem itself. 

A copy of each of these tests will be found in Appendix C 
; , the complete study, on file in the College of Education 

rary of the State University of Iowa. 
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Form A— 
Directions: All capital letters have been 
omitted from the sentences in this test. You 
are to indicate where you think capital let- 
ters should be used by drawing a heavy 
vertical line through the first letter in each 
word which should be capitalized. There 
may be more than one word to be capitalized 
in any sentence. Be careful not to mark any 
words which should not be capitalized. The 
sample exercise has been marked correctly.** 


Sample: when did napoleon die? 


Comments: In this test, the pupil must find 
the capitalization situations for himself and 
“supply” the necessary capitals in the test 
copy itself. 


Form B— 

Directions: This test consists of a number 
of sentences in which there are two kinds 
of capitalization errors. In some words there 
is no capital where there should be one. In 
some other words there may be a capital 
letter where one does not belong. You are to 
go through these sentences and _ indicate 
where capitalization errors have been made. 
Do this by drawing a vertical line through 
the letter in which the error occurs. Some 
sentences contain more than one error and 
some do not contain any errors. The sample 
exercise has been marked correctly.** 


Sample: Henry johnson has a new Hat. 


Comments: This test also requires the 
pupil to find the capitalization situations for 
himself. It differs from Form A in that the 
pupil must correct the errors, whereas in 
Form A he merely “supplies” the missing 
capitals. 


Form C— 
Directions: This test consists of a number 
of sentences in which many of the capitals 
have been omitted. Five of the words in each 
sentence are numbered. Study each sentence 
and decide which of the numbered words 
should be capitalized. Note the number un- 
der each word which should be capitalized. 
Then in the proper row on the answer sheet 
which is inserted in your test booklet, place 
a heavy pencil mark between the pair of 
vertical lines which is numbered the same 
as the word. In some sentences, more than 
one of the numbered words should be cap- 


** Italicized letters indicate those which had vertical line 
drawn through them. These vertical lines could not be repro- 
duced in type. 
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italized. The sample exercise has _ been 
marked correctly. 


Sample: On Test Page 
1. John went to chicago last august. 
l Ss 2 4 5 
On Answer Sheet 
2 3 4 5 


. | | 

Comments: This test was included in the 
study because it is adapted to the use of the 
separate answer sheet which is fast becoming 
popular. The answer sheet can be scored 
either manually with a stencil-type key or 
with an electrically operated scoring machine. 
The number of error situations per sentence 
is limited and the test may be somewhat open 
to guessing. 


Form D— 

Directions: In this test, each sentence is 
broken up into two or more lines. Each line 
is numbered. Many of the lines contain an 
error in capitalization. You are to study 
each line carefully and decide whether or 
not it contains a mistake in capitalization. 
If the line contains no capitalization error, 
put a check mark under the “R” at the end 
of the line. If you find an error in capitaliz- 
ation in the line, put a check mark under the 
“W" at the end of the line. The sample 
exercise has been marked correctly. 


R W 

Sample: 1. When john went to Vv 
2. Chicago last month,he Vo 

8. saw several good Shows. = -V 


Comments: This test is easily scored and 
is economical of administration time. A sepa- 
rate answer sheet could be used with this type 
of test. Its disadvantages are that it is limited 
to one error per line and that it is open to 
guessing to a large degree. 


Form E— 

Directions: This test consists of a number 
of sentences in which many capitals have 
been omitted. Five of the go 
words in each sentence are numbered. Study 
each sentence and decide which of the num- 
bered words should be capitalized. Note the 
number under each word that should be cap- 
italized. Then at the end of the sentence 
place a check mark in the blank numbered 
the same as the word. In some sentences 
more than one of the numbered words may 
need to be capitalized. The sample exercise 
has been marked correctly. 


Sample: 
John went to chicago last august. 
1 23 4 5 
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Comments: This test differs from Form ( 
in but one respect. Instead of making his re- 
sponses on a separate answer sheet, the pupi| 
makes his responses on the test page itsel/ 
This test is easily scored and is fairly eco- 
nomical of administration time. Its disad- 
vantages are that the test author is limited in 
the number of situations he may include jin 
each line or sentence and it may be somewhat 
open to guessing. 


Form P— 
Directions: In the sentences in this for 
many capital letters have been omitted. 
You are to read each sentence carefully and 
then study each word which has a number 
printed under it. If you think this word 
should begin with a capital letter, mark a 
cross in the pair of parentheses under the 
“C” on the answer sheet opposite the num- 
ber of the word. If you think the word 
should begin with a small letter, place a 
cross in the parentheses under the “s”. Do 
not pay any attention to the words that do 
not have numbers under them. The sample 

exercise has been marked correctly. 


Sample: On Test Page On Answer Sheet 


Cc s 

Did john like your speech? 1. (*) ie. 
1 2 , s 

2.¢ ) (X) 


Comments: This test was also included in 
the study because it employs a separate an- 
swer sheet. The answer sheet used in this case 
was mimeographed and scored manually with 
a stencil-type key. It could easily be prepared 
for machine scoring. The disadvantage of the 
test is that it may be open to guessing. 


b. The Punctuation Tests 


Nine different types of objective punctua- 
tion tests were chosen for evaluation, using 
the same criteria employed in selecting the 
capitalization tests. These tests will be de- 
scribed in the same manner as were the 
capitalization tests. 


Form F— 

Directions: This test consists of a number 
of sentences with all of the punctuation 
marks omitted. You are to go through the 
sentences carefully and place the prope! 
punctuation marks wherever you think they 
are needed. Make the marks very distinct 
and be sure to place them exactly where 
they belong. Do not put marks where no 
unctuation is needed. The sample exercise 
os been marked correctly. 


Sample: Isn’t John going to school with us? 
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(Comments: The advantages of this type of 
1 test are: (1) it requires the pupil to locate 
: the punctuation situations for himself with- 
{ any suggestions provided; (2) it requires 
him to supply the necessary punctuation; and 
-) it eliminates to a large degree the element 
f guessing. 
Form G 
Directions: Many of the sentences in this 
test contain one or more punctuation errors. 
You are to go through the sentences care- 
fully and indicate where you think a punc- 
tuation error has been made. If a punctua- 
tion mark has been omitted, put it in where 
t belongs. If a wrong punctuation has been 
ised, cross it out by drawing a vertical line 
through it and put the correct mark beside 
t. Do not try to make the correct mark out 
f it. There may be more than one punctu- 
ation error in many of the sentences. Some 
of the sentences may not contain any punc- 
tuation errors. The sample exercise has been 
marked correctly.* 
Sample: 
| like a man; who isn’t afraid of work?. 
(omments: This test also requires the pupil 
to find the situations for himself. It differs 
irom Form F in that the pupil must also cor- 
rect errors instead of just supplying missing 
punctuation marks. It also rules out much of 
the element of guessing. Its disadvantage is 
that it requires the pupil to look for errors 
rather than to supply needed punctuation. 


Form I— 


Directions: The sentences in this test are 
broken up into two or more lines. Many of 
the lines contain a punctuation error. Five 
of the words in each line are numbered. 
Study each line carefully and decide where 
additional or different punctuation is needed. 
Find the numbered word after which, or in 
vhich an additional or different punctuation 
mark is needed. Place a check mark at the 
end of the line in the blank numbered the 
same as this word. No line contains more 
than one error. Some of the lines may be 
correct as they are. The sample exercise has 
been marked correctly. 


Sample: 


Petes ntl ea nee: 


“He doesnt need to go eee ek eae ea 
l1 2 3 4 5 123 4 5 
home right away” said she, seit Ais i, is 
] 2 3 4 123 4 6 

“because it is still early?” ; eee. JS 
l 234 5 123 4 5 
Comments: This test requires the pupil 


only to locate the place where the punctuation 
error occurs. The sentences may be somewhat 
It ‘ticized letters indicate those which had vertical line 


ugh them. These vertical lines could not be repro- 
n type. 
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difficult to read, since each is broken up into 
two or more lines. This type of test limits the 
test builder to only a few situations per line. 


Form J— 
Directions: The sentences in this test are 
broken up into two or more lines. Many of 
the lines contain a punctuation error. Five 
of the words in each line are numbered. 
Study each line carefully and decide where 
additional or different punctuation is needed. 
Find the numbered word after which, or in 
which an additional or different punctuation 
mark is needed. Then in the proper row on 
the separate answer sheet which is inserted 
in your test booklet, place a heavy pencil 
mark between the pair of vertical lines 
which is numbered the same as this word. 
No line contains more than one error. Some 
of the lines may be correct as they are. The 
sample exercise has been marked correctly. 


Sample: On Test Page 


1. “He doesnt need to go 
a 3 4 5 
2. home right away” said she, 
1 . ‘ 5 
eed 


3. “because it is still early? 


1 23 4 5 
On Answer Sheet 
1 2 ; 4 5 
1 | 
1 2 3 4 5 
2 | 
1 2 3 4 5 
3 | 


Comments: This test differs from Form I 
only in that the blanks do not appear on the 
test page, but on a special answer sheet 
adapted to machine scoring. It is included in 
the study for this reason. This type of test 
also limits the test builder to only a few 
situations per line. 


Form K— 

Directions: This test consists of a number 
of sentences in which punctuation marks are 
needed. In many places in the sentences, 
three punctuation choices are given. After 
studying each sentence carefully you are to 
consider each of these places separately, de- 
cide what punctuation is necessary at this 
place, and then underline the proper punc- 
tuation. In some places, punctuation marks 
may not be needed. In these places, you 
should underline the pair of parentheses 
which does not enclose any punctuation 
marks. The sample exercise has been 
marked correctly. 


Sample: (Isnt) (Isn’t) (Is’nt) John going to 





(_)(,)(-) school with us (.) (:) (7) 


Comments: This test is economical of ad- 
ministration time. It may be scored readily 
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for position only by means of a stencil-type 
key. It is a recognition test and may be open 
somewhat to guessing. It may also be rela- 
tively difficult to read. 


Form L 

Directions: This test consists of a number 
of sentences in which punctuation marks are 
omitted. Under several places in each sen- 
tence you will find a small number. Each of 
these numbers represents a place where 
punctuation may or may not be needed. On 
the answer sheet inserted in your test book- 
let, you will find several punctuation marks 
suggested opposite each of the numbers. The 
“N” opposite many of the numbers means 
no punctuation. After reading each sentence, 
you are to consider each numbered place 
separately, decide what punctuation is neces- 
sary at this place, and then underline the 
proper punctuation opposite the same num- 
ber on the answer sheet. In “underlining” 
each choice, draw a short, heavy line com- 
pletely filling the space between the parallel 
lines. The sample exercise has been marked 
correctly. 


Sample: On Test Page 
Tim our pet dog died last night 
1 2 3 4 


On Answer Sheet 
3. 9) (G) 3. & (N) 
2. W) (N) 4. © (7) (N) 


Comments: This test can be scored manu- 
ally with a stencil-type key, or the answer 
sheet can be prepared for machine scoring by 
special printing. It draws specific attention to 
the error situations and to that extent avoids 
proof reading difficulties. It is a recognition 
test, and may be somewhat open to guessing. 


Form M— 


Directions: Each of the sentences in this 
test contains a number of punctuation 
marks. Some of these marks are needed to 
make the sentences correct; others are not. 
You are to study each sentence and decide 
what punctuation is necessary. If a punctu- 
ation mark is needed, draw a short, heavy 
line between the parallel lines below it. If 
more than one mark is needed at any place, 
be sure to underline the choices that will put 
the marks in the correct order. Do not un- 
derline any marks that are not needed. The 
sample exercise has been marked correctly. 


Sample: Tim, our pet, dog, died last night.? 
a = es 


Comments: By careful arrangement, a test 
of this type can be printed on sheets which 
are adapted to machine scoring. Or the test 


can be scored manually with a stencil-type 
key. The test is economical of administration 
time. It also draws specific attention to thy 
error situations and to that extent avoids thy 
proof reading problem. It may be open | 
guessing to a considerable degree. 


Form N— 
Directions: This test consists of a numl» 
of unpunctuated sentences. Above man 
places in the sentences, punctuation marks 
are printed in parentheses. In some places, 
only one choice of marks is given; in othe 
places, more than one choice is given. |) 
many cases, punctuation marks are printed 
above the sentence even though no punctua 
tion is needed at that place. You are to read 
each sentence carefully and then decid 
where punctuation is needed. Wherever you 
think that no punctuation is needed, pay no 
attention to the parentheses above, but ge 
on to the rest of the sentence. At those 
places where you think some punctuation 
needed, look at the punctuation given in the 
parentheses above and encircle the numbe 
printed above the correct punctuation. Ther 
turn to the answer sheet which is inserted in 
your test booklet and mark a cross in the 
parentheses opposite the same number as 
the one you encircled on the test page. The 
sample exercise has been marked correctly. 


Sample: On Test Page 
[1] 2 [3] [4} 5 
(,) (,) (,) (.) (7) 


Tim our pet dog died last night 


On Answer Sheet 


A. (2) 3. (X) B { ) 
, a i 4. (X) 6. ( ) 


Comments: In this study, a mimeographed 
answer sheet was used with this test and 
scored manually with a stencil-type key. An 
answer sheet could easily be prepared for 
machine scoring. Since it indicates the error 
situations specifically, this test may to some 
extent avoid proof reading difficulties. The 
test is probably not highly economical 0! 
administration time and it may be somewhat 
open to guessing. 


Form O— 

Directions: This test consists of a num!» 
of unpunctuated sentences. Above many 
places in the sentences, punctuation marks 
are printed in parentheses. In some cases, 
punctuation marks are printed above the 
sentence even though no punctuation 

needed at that place. Read each sentence 
carefully and then at each place where @ 
bracket is above the sentence, decide what 
unctuation, if any, is needed at that place. 
f no punctuation at all is needed, place @ 
small cross under the (N) at the left end of 
the bracket. If you think some punctuation 
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needed, place a cross under the correct 
p unetuation. Do not worry about the places 
that do not have any punctuation marks 
above them. The sample exercise has been 
marked correctly. 
Sample: (N)(,) (N)G)ONIG) ONC) 
x Xx x x 
Tim our pet dog died last night 
Comments: This test should be more eco- 
nomical of administration time than Form N. 
it also draws specific attention to the error 
situations and to that extent avoids proof 
reading difficulties. It is a recognition test 
and may be somewhat open to guessing. 


THE PROBLEM 


It is the purpose of this study to evaluate 
the several types of objective capitalization 
and punctuation tests just described. The 
principal factors considered in the evaluation 
are validity, reliability, and the time required 
for administration. The validity of each of 
the tests will be determined by computing the 
correlation between the scores on the experi- 
mental test and those on a dictation test con- 
taining exactly the same sentences used in 
each of the experimental tests. 

In addition, two important factors related 
to the measurement of punctuation ability 
will be investigated. One of these studies 
deals with the importance of the proof read- 
ing factor in an objective test in punctuation. 
rhe other has for its purpose a study of the 
effect of pupil practice upon the validity of a 
new type of objective test in punctuation. 
Each of these studies will be described in 
detail later in this report. 


THE CRITERION TEST 


1. REQUIREMENTS OF AN ACCEPTABLE 
CRITERION 


Probably the most satisfactory method of 
determining the validity of a given test is to 
relate it to an independent criterion measure 
of the ability or abilities to be tested. Before 
the criterion is selected, the general require- 
ments of an acceptable criterion measure of 
ability in capitalization and punctuation 


should be considered. 

First of all, the manner in which the error 
situations are presented to the pupil in the 
criterion test should be as nearly as possible 
like that in which they are presented to him 
in actual writing. If the criterion tests skills 
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not involved in actual writing, certain of the 
tests may appear to be effective merely be- 
cause they are measuring these irrelevant 
abilities. 

A second essential is that the criterion 
should include an adequate sampling of all 
of the important writing situations in which 
it is known that pupils make frequent or 
crucial errors in capitalization or punctua- 
tion. If the criterion includes an undue pro- 
portion of a certain type of situation, it would 
favor in the comparisons any type of test that 
happened to be particularly effective in meas- 
uring that type of skill. 

Finally, the criterion should present the 
same situations to all pupils being tested. 
That is, every pupil must be required to re- 
spond to the same number and kind of situ- 
ations. A test which measures certain skills 
for some pupils and other skills for other 
pupils will obviously not provide comparable 
measures of achievement. 

It is mot an essential of the criterion that 
it be readily administered or scored. The sole 
consideration is validity, and for experimental 
purposes, the investigator can afford to devote 
an amount of time and effort in securing a 
good criterion measure that would be imprac- 
ticable in the actual testing situation. 

Possible procedures for securing a valid in- 
dependent measure of ability in capitalization 
or punctuation include the use of pupils’ free 
writing and the use of a dictation test. These 
procedures will now be examined to deter- 
mine how well they satisfy the requirements 
already set forth. 


2. CRITERION MEASURES Basep Upon 
FREE WRITING 


a. The Error Count 

One possible procedure for securing a cri- 
terion measure of ability in capitalization or 
punctuation is to secure a sample of the actual 
writing of the pupil and to count the errors 
he makes. This procedure has been used by 
some investigators® as a criterion measure of 
ability in capitalization and punctuation. 

The error count based on free writing pos- 
sesses only one of the essential characteristics 
of an acceptable criterion. The first essential, 
that the situations be presented as in actual 
writing, is obviously satisfied in this case, 
since the error count is based upon the writing 
itself. However, its failure to satisfy the 

?See Powell (5), and Willing (6). 
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other requirements constitutes a serious ob- 
jection to the use of the error count in free 
writing as a criterion. 

The requirement that the criterion should 
include an adequate sampling of the impor- 
tant capitalization and punctuation situations 
is not satisfied when the free writing of pupils 
is used to secure the criterion measure. This 
procedure actually secures only a measure of 
the pupil's ability to use correctly those skills 
which he does use. The pupil is not required 
to demonstrate his ability in any given num- 
ber or kind of situations. If he is not familiar 
with the correct form in one situation, he 
may avoid it by substituting a form with 
which he és familiar. In other words, the 
pupil is actually given the privilege of con- 
structing his own test. 

Finally, because in free writing each pupil 
decides for himself what situations he uses, 
an error count based upon free writing will 
not secure comparable measures of achieve- 
ment for all of the pupils. The variability in 
the number and kind of situations used by the 
individual pupils will be as great as their in- 
dividual differences, not only in language 
ability, but in habits, interest, attitudes, and 
the like. In fact, pupils of superior ability 
may actually make more errors than those of 
inferior ability simply because they attempt 
more difficult forms. 


hb. The Error Quotient 

Another way of deriving a measure of abil- 
ity in capitalization and punctuation from 
free writing is to determine what is known as 
an error quotient. This is derived by divid- 
ing the number of errors the pupil makes by 
the total number of opportunities for error 
that he creates. This procedure gets away 
from only one limitation of the error count 
procedure. Instead of showing only how 
many errors the pupil makes in those forms 
he attempts, the error quotient shows what 
proportion of his attempts are wrong. That 
the use of the error quotient does not im- 
prove the measure secured was found by 
Powell (5, p. 85) who obtained a correlation 
of .o8 between the total number of errors and 
the error quotient derived from over six hun- 
dred thousand running words of free writing 
secured from 302 ninth grade pupils in Iowa 
high schools. He concludes that “ . . . this 


correlation establishes conclusively that the 
error count and the error quotient procedures 
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are measuring almost exactly the same func- 
tions.” 

Since it is also based upon free writing, the 
error quotient procedure does not satisfy the 
second and third essentials set forth for ay 
acceptable criterion. That is, the use of fre: 
writing as a measure of ability in capitaliza- 
tion and punctuation does not result in ay 
adequate sampling of the important skills 
and it does not result in comparable measures 
of achievement for all of the pupils. 

Because they do not satisfy all of the re- 
quirements of an acceptable criterion, bot! 
the error count and the error quotient proce 
dures must be rejected as unsatisfactory for 
use as a criterion measure of ability in cap- 
italization and punctuation. 


3. Tue Dictation TEst 


Another procedure which may be used 1 
secure an independent measure of the abilit 
of the pupil to use capitalization and punctu- 
ation correctly is to dictate prepared material: 
to the pupils. The pupils are instructed to 
copy the sentences in their own handwriting 
and to supply the correct capitalization and 
punctuation as they write. The papers are 
then checked for errors in the use of these 
skills. 

If the dictation test is to be a satisfactory 
criterion, it must first of all parallel as closely 
as possible the actual writing situation. It 
may be well, therefore, to consider in more 
detail just what the actual writing situation 
is like. Mastery of capitalization and punctu- 
ation skills is quite generally conceived of as 
the ability to use them correctly in free writ- 
ing. An individual who has this mastery ma) 
be said to have formed correct habits of writ- 
ing. In actual writing, he is not conscious 0! 
capitalization and punctuation as such. The 
situations are not labeled, and the writer does 
not stop at various points in his writing t 
remind himself of a rule. Rather, he allows 
his automatic habits to function. If he does 
not, then he has not achieved mastery of these 
skills. In a sense, then, capitalizing and punc- 
tuating are as integral a part of handwriting 
skill as the formation of letters and words. 

The free writing situation cannot be dupli- 
cated entirely when materials are dictated to 
the pupils, because in the latter case the pupil 
does not choose his own sentence content. In- 
stead he is provided with the content. Once 
he has been provided with this content, from 
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that point on the process is the same as if he 
had thought of the idea himself. In the dic- 
tation test, as in free writing, the situations 
‘ny which errors are likely to be made are not 
labeled, and the writer is not necessarily con- 
scious of capitalization and punctuation as 
such as he copies the dictation. In copying 
from dictation, he does not necessarily stop 
to remind himself of the rules involved, but 
mav allow his automatic writing habits to 
operate much as they do in free writing. In 
view of the above reasoning, the dictation 
test may be said to satisfy in the most impor- 
tant respects the essential that the criterion 
should present the error situations to the 
pupils in a manner that is as nearly as pos- 
sible like that in which they are presented to 
him in actual writing. 

The dictation test can clearly be made to 
-atisfy the requirement that the criterion 
should include an adequate sampling of the 
important capitalization and punctuation sit- 
uations. The test builder can prepare the ma- 
terial to include any desired number of each 
of the different kinds of situations. In other 
vords, when the dictation test is used, the 
number and kind of skills to be measured are 
subject to control by the examiner. This is 
not possible in the case of the free writing 
techniques. 

Finally, the dictation test satisfies the third 
essential of an acceptable criterion measure, 
that it should present the same situations to 
all pupils being tested. When the dictation 
test is used, all pupils are tested upon the 
same situations, and not only upon the ones 
they choose individually to employ, as is the 
case when the procedures based upon free 
writing are used. Each pupil is required to 
react to each situation and cannot, with im- 
punity, avoid situations over which he does 
not have control. 

It is recognized that the use of the dicta- 
tion test as a criterion measure of ability in 
capitalization and punctuation has certain 
limitations. These limitations are listed below: 


'. The material of the dictation test is not 
of the pupil’s own composition. There 
are probably many equally satisfactory 
ways of expressing the same idea. When 
the pupil is forced to express an idea in 
a fashion that is not his own, situations 
may be created that would not occur in 
his actual writing. That is, when per- 
mitted to write the idea in his own way 
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he might capitalize and punctuate cor- 
rectly, but when forced to use another's 
diction, he might make errors. The 
effect of this limitation can be mini- 
mized, however, by constructing sen- 
tences similar to those the pupil himself 
would be likely to write. 

2. Pupils may have difficulty in grasping 
the meaning of the sentences dictated. 
Their errors may thus be caused by lack 
of comprehension of the sentence rather 
than by their inability to use correct 
punctuation as such. The purpose of 
punctuation is to aid in clarifying the 
meaning of the sentence. In the pupil’s 
own writing, the meaning is always clear 
to him because he knows what he wants 
to say. It may not be clear when the 
sentence is not of his own composition. 
The effect of this limitation can be mini- 
mized: (1) by being careful to avoid 
ambiguities and other factors that would 
prevent the pupil from grasping the 
meaning of the sentence readily; (2) by 
reading the sentences naturally and in 
such a manner that the meaning is 
clearly evident; and (3) by giving the 
pupils adequate opportunity to grasp 
the total meaning of the sentence before 
asking them to copy it. 


3. Some pupils may have difficulty in hear- 
ing the material as it is dictated. This 
limitation can be overcome by dictating 
the material loudly and _ distinctly 
enough that each pupil can hear and 
understand each word. 


Other procedures for securing a criterion 
measure of pupils’ ability in capitalization and 
punctuation were also considered. These in- 
clude printed material with all capitals and 
punctuation omitted, and tests of rules for 
the use of these skills. It will be obvious to 
the reader that none of these techniques sat- 
isfies the requirements herein set forth for an 
acceptable criterion measure of these language 
skills. The use of the dictation test is, there- 
fore, considered the most acceptable proce- 
dure for securing a criterion measure of abil- 
ity in capitalization and punctuation. 


4. DESCRIPTION OF THE CRITERION TEST 


The dictation test used as the criterion in 
this investigation consists of eighty-eight sen- 
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tences of which the following are a represent- 
ative sample.’ 
1. Men who work hard need all the 
rest that Labor Day gives them. 
. If you had seen the boy’s face at 
two P.M., you could net doubt him. 
3. Henry, I want your assistance, not 
just your permission to do it. 
4. Helen, who was especially frightened, 
screamed, “Oh, don't let it bite me!” 
5. Then Miss Johnson told us about 
apples, oranges, peaches, and pears. 
6. Before he started teaching school, 
Mr. Wilson went to Coe College. 


te 


The test is made up of four sections, each 
of which contains twenty-two sentences sim- 
ilar to those given above. Before the sen- 
tences were constructed, a careful analysis 
was made of the studies conducted by several 
investigators of pupils’ use of capitalization 
and punctuation in their own writing. The 
situations incorporated in these sentences 
were selected on the basis of this analysis. 

The number and kind of these “planted” 
situations are the same for each of the four 
sections of the test. In other words, these are 
equivalent sections. Each of the four sections 
of the dictation test includes: (1) twenty-nine 
“planted” situations requiring capitals and 
five not requiring capitals, but in which it is 
known that pupils tend to use capital letters; 
and (2) thirty-four “planted” situations 
which require punctuation and eight which 
do not require certain punctuation, but in 
which pupils frequently use punctuation. 
Thus the test measures not only the pupil’s 
ability to use certain needed capitalization 
and punctuation correctly, but also any tend- 
ency he may have to over-capitalize or to 
over-punctuate. 

It is entirely possible that another investi- 
gator, working independently on a similar 
problem, might arrive at a somewhat different 
list of situations. It does not seem reasonable, 
however, that his list would differ greatly 
from the one used in this study. Furthermore, 
since the same list of situations is included in 
each of the techniques to be evaluated, it does 
not seem reasonable that the relative effective- 
ness of the various tests would differ if any 
other carefully selected distribution of situ- 
ations were used. 


® These sentences are taken from Section A of the criterion 
test. A copy of the complete test may be found in A dix 
C of the complete study on file in the College of Education 
Library of the State University of Iowa. 
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Since the manner in which this dictation 
test was administered is an integral part of 
the description of the criterion used in this 
investigation, the reader is advised to base his 
evaluation of the test upon a consideration of 
both the content of the sentences and the 
manner in which they were dictated to the 
pupils. The method used in dictating the 
materials included in the criterion test js 
described later in this article. 


THE PROCEDURE 
1. NATURE OF THE INVESTIGATION 

This study consists of a series of independ- 
ent experiments in each of which two of the 
experimental tests in capitalization or punctu- 
ation are evaluated. Table I shows how the 
various types of tests were paired in each o/ 
the experiments. 


TABLE I 


THE TESTING TECHNIQUES EVALUATED IN THE 
VARIOUS EXPERIMENTS 


Experiment Testing Techniques Evaluated 


Number Capitalization Tests: 
, Ga ae a Forms A and B 
SSS Forms A and C 
© cacccecnscnsn See 2 ae 2 
4 ............ Forms C and E 
5 eRe) Forms A and E 
6 = si Forms A and P 

Punctuation Tests: 
ae ll! UR 
eae eee Forms F and I 
9 --.--«--. Forms F and J 
BF cunnccmniiann Qe 5 ie a 
i ae eae Forms F and K 
TE nn ecncennn Pee 2 ae © 
ee Forms F and M 
14 copusnume Qe FT Ce me 


Because Form A has probably been more 
widely used than any of the other capitaliza- 
tion tests, and because it is more nearly like 
the criterion test than is any of the others, it 
is paired with each of the forms in independ- 
ent experiments. Forms C and E are com- 
pared to determine the effect of the use of 
the separate answer sheet. Because Form F 
has been used widely, and because it is the 
only one of the punctuation tests that re- 
quires the pupil actually to supply punctua- 
tion where it is needed, it is paired with each 
of the other forms in independent experi- 
ments. Forms I and J are compared to deter- 
mine the effect of the use of the separate 
answer sheet in a punctuation test. 
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In each of the experiments, public school 
pupils in grades seven and eight constituted 
the experimental sample. With but two ex- 
ceptions, it was necessary to use classes from 
more than one school in order to secure the 
desired number of pupils for testing. Table 
[I gives a description of the samples used in 
each of the experiments. 


TABLE II 


DESCRIPTION OF THE EXPERIMENTAL SAMPLE IN 
EACH OF THE FOURTEEN EXPERIMENTS 


Experiment Number of Number of 
Number Pupils Tested Schools 
] ‘asta 300 3 
2 eaieetion 300 7 
3 ae" 300 SN 
{ aca 300 2 
5 re 302 7 
6 Reervat 296 3 
7 fn 308 1 
x —— 275 3 
4 — 307 1 
10 aetabes 239 7 
11 306 7 
12 : 258 7 
13 ee 280 4 
14 : 292 3 


>. STANDARD PROCEDURE IN INDIVIDUAL 
EXPERIMENTS 


The same procedure was followed in each 
of the fourteen independent experiments. 
The steps in this procedure will be described 
in the remaining pages of this part of the 
irticle 


1. Administration of the Tests 


The criterion test was administered in the 
same manner in both the capitalization and 
the punctuation experiments. Without previ- 
ous warning, the pupils to be tested in a given 
school were called together at the same time 
ind the sentences were dictated to them. To 
avoid undue pupil fatigue, one half (Sections 
\ and B) was administered on one day and 
the second half (Sections C and D) on the 
day following. 

lo insure proper administration of the cri- 
terion test, carefully prepared directions were 
discussed and rehearsed thoroughly with each 
examiner. Copies of special directions were 
given to the pupils to be read silently while 
the examiner read them aloud. The pupils 
were told to copy the sentences, spelling each 
word correctly and supplying all necessary 
punctuation and capital letters. Each sen- 
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tence was first read aloud twice to the pupils 
to give them an opportunity to grasp the 
meaning of the sentence. It was then read in 
two parts, with the pupils being given time 
between the parts to copy each on specially 
prepared ruled sheets. 

A week after the administration of the cri- 
terion test, and again without previous warn- 
ing, the experimental tests were administered 
to the same pupils. Again, carefully prepared 
directions for administering the tests were 
discussed thoroughly with each of the exam- 
iners. Each of the experimental forms was 
divided into two parts, each part based upon 
two of the four sections of the dictation test. 
For example, Form A-—1 contains the content 
of the first two sections of the dictation test 
and Form A—2 that of the last two sections. 

After all pupils to be tested in a given 
school were assembled, the tests were so dis- 
tributed that the odd-numbered pupils re- 
ceived the first part of one form and the re- 
maining pupils received the first part of the 
other form. After the pupils had written 
these tests, each pupil then wrote the second 
part of the form he had not yet taken. This 
procedure has two advantages. In the first 
place, if the order of writing the tests affects 
performance on them, the reversal of the 
order in the two groups should tend to 
equalize any such differences. Secondly, this 
method makes it unnecessary for the same 
pupil to react to the same situations more 
than twice throughout the entire experiment. 

All of the tests were administered under 
work-limit conditions. That is, all of the 
pupils were allowed to complete each of the 
tests written. This was done, first of all, be- 
cause to administer the tests under time-limit 
conditions, it would first be necessary to de- 
termine for each test what the optimum 
administration time is for a given amount of 
material. Otherwise some tests might arbi- 
trarily be given more adequate time limits 
than others and thus be favored in the com- 
parisons on that account alone. As yet, no 
very satisfactory technique is available for 
determining the optimum time-limit for a 
given test. There is also the possibility that 
the optimum administration time of the test 
may vary from one school to another. If this 
is true, the optimum time limit would have 
to be determined separately for each school 
participating in an experiment. 

It is recognized, however, that most stand- 
ardized tests are administered under time- 
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limit conditions. The time limits assigned to 
these tests are probably based largely upon 
subjective opinion. It is further recognized 
that pupil performance on a test administered 
under work-limit conditions may differ from 
that on the same test when administered 
under time-limit conditions. However, it is 
believed that there is a sufficient relationship 
between the relative effectiveness of tests 
under work-limit conditions and under time- 
limit conditions to make an evaluation of the 
tests under the former conditions worthwhile. 

So that the tests could be compared with 
respect to the amount of time required to 
write each one, each pupil was directed to 
record on his test paper the number of min- 
utes required to write the test. The examiner 
wrote the elapsed time on the blackboard at 
the end of each minute. Thus, when a pupil 
completed his test, he simply copied down the 
number of minutes written on the blackboard. 


h. Scoring the Tests 

The dictation test was scored, using a 
master copy of the test with the planted situ- 
ations identified on it. Each of the experi- 
mental tests was scored, using a carefully 
prepared stencil-type key. For the dictation 
test and for each of the experimental tests, 
the response of each pupil to each of the 
planted capitalization or punctuation situa- 
tions was recorded as right or wrong. 

For each pupil, two scores were established 
for each section and for each half of the test. 
In the capitalization experiments, an uncor- 
rected score was established by simply count- 
ing the number of correct responses to those 
situations requiring capitals. A _ corrected 
score was determined by subtracting from the 
uncorrected score the number of incorrect re- 
sponses to those situations not requiring cap- 
itals. For the punctuation tests, the uncor- 
rected score consists of the number of correct 
responses to those situations requiring punc- 
tuation. The number of incorrect responses 
to those situations not requiring certain 
punctuation marks was subtracted from the 
uncorrected score to get the corrected score. 


c. Statistical Analysis of the Results 

The first step in the statistical analysis of 
the results was to prepare distributions of the 
scores on each section of the dictation test for 
each of the capitalization groups and for each 
of the punctuation groups, and to compute 
the mean and the standard deviation for each 
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of the distributions. This was done to obtaiy 
an empirical check upon the actual equiy- 
alence of the four sections. After this check 
was made, reliability coefficients were esti- 
mated for the criterion test in each experi- 
ment and for each of the two parts of each 
of the experimental tests. Then for each of 
the experimental tests, validity coefficients 
were determined for each of the tests whe: 
administered under work-limit conditions by 
computing the correlation between the scores 
on the experimental test and those on thy 
corresponding parts of the dictation test. 
The usual procedure in studies of this kin 
has been to compute the various correlations 
for the total sample used in each experiment 
disregarding the fact that the sample is con 
posed of many smaller intact groups (classes 
in individual schools). Samples of this type 
may not be regarded as random samples. and 
the usual random sampling error technique: 
are not applicable to them, since large sys- 
tematic differences in test performance fron 
school to school render the obtained correla- 
tions much less stable than correlations ob- 
tained from random samples of the same size 
Therefore, in this study all of the correlation: 
discussed are computed “within classes.” Ip 
effect, what is done is to compute the cor- 
relation for each class separately and then t 
average the correlations for all of the classes 
in a given experiment. Each correlation re- 
ported in this investigation, then, is the typ- 
ical or average correlation found within « 
single class. These correlations are independ- 
ent of the effect of systematic differences in 
level of performance from school to schoo! 
and, so far as sampling error formulas ar 
concerned, may be considered, for practica! 
purposes, as computed from random samples 
The arithmetical procedure used to compute 
these “within class” correlations is that 0! 
analysis of co-variance, described by Fisher 
(2, pp. 275-290). . 
Since the dictation test may be consicere¢ 
as four equivalent forms of the same test, the 
reliability of the criterion was computed from 
the intercorrelations between these forms 
The reliability of two sections of the dictation 
test was estimated by means of the Spearman- 
Brown prophecy formula from the average 0! 
the six intercorrelations between the scores 
on each of the four sections of the test. Like- 
wise, the reliability of each of the two part: 
of each of the experimental tests was est 
mated from the intercorrelation computed 
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between the scores on the two sections in each 
part. First the uncorrected and then the cor- 
rected scores were used in computing the 
various correlation coefficients. 


An analysis was also made of the responses 
to the individual situations in the criterion 
test and in each of the experimental tests in 
each of the experiments. A four-celled con- 
tingency table, similar to the one illustrated 
below, was prepared for each of the situa- 
tions in each test. 


Dictation Test ‘ 
R Ww 





49.0% 3.3% 





Form A 


38.4% 9.3% 














The values given in the illustration are for 
the first capitalization situation and are taken 
from the results obtained in the sixth experi- 
ment. They may be interpreted as follows: 
Of the 296 pupils writing the tests, forty- 
nine per cent of them responded correctly to 
the first situation in both Form A and in the 
dictation test. A little over thirty-eight per 
cent of the pupils responded correctly to the 
situation in the dictation test and incorrectly 
in Form A. Over nine per cent of the pupils 
erred on both tests, and over three per cent 
of them responded incorrect!y on the dicta- 
tion test and correctly on Form A. 


After a contingency table like the one 
above was prepared for each of the situations 
and for each of the experimental tests, the 
values in the cells were used to secure a 
measure of the relationship between the 
pupils’ responses to each situation in the 
criterion test and in each of the experimental 
tests. This was done to determine the relative 
effectiveness of the different tests with respect 
to the individual situations. The statistical 
measure computed for each situation was the 
tetrachoric correlation coefficient. Then for 
each of the experimental tests, these tetra- 
choric r’s were arranged in a frequency dis- 
tribution and the twenty-fifth, fiftieth, and 
seventy-fifth percentiles computed for each of 
the distributions. 
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Finally, for each of the experimental tests 
used in each experiment, frequency distribu- 
tions were prepared of the numbers of min- 
utes required by the individual pupils to 
write the tests. From these distributions the 
amount of time required for various percent- 
ages of the pupils to complete the tests when 
administered under work-limit conditions was 
determined. 

Descriptions of the procedure employed in 
the study of the other two problems, (1) the 
effect of practice upon the validity of a punc- 
tuation test, and (2) the importance of the 
proof reading factor in a test in punctuation, 
will be given in the sections devoted to them. 


THE RESULTS 


1. RESULTS CONCERNING THE CAPITAL- 
IZATION TESTS 

a. Reliability of the Criterion Tests 

In each experiment, the criterion used to 
evaluate the experimental tests was the total 
score on two sections of the dictation test. 
The reliability of this criterion was estimated 
by means of the Spearman—Brown prophecy 
formula from the average of the six within 
groups intercorrelations computed between 
the four equivalent sections of the dictation 
test. These estimated reliabilities for the 
criterion test in capitalization are given in 
Table III for each of the six experiments. 


TABLE IIT 


ESTIMATED RELIABILITY (WITHIN GROUPS) OF 
A COMBINATION OF TWO SECTIONS OF THE 
CRITERION TEST IN CAPITALIZATION IN EACH 
OF THE SIx EXPERIMENTS 


Experiment 


* Uncorrected scores used. 
** Corrected scores used. 


It will be recalled that the scores on the 
criterion tests were “corrected” by subtract- 
ing the number of incorrect responses to the 
situations not requiring capitalization from 
the number of correct responses to the situ- 
ations which did require capitalization. It 
may be observed that these corrections had 
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little or no effect upon the reliability of the 
obtained scores. The reliabilities of the cri- 
terion tests are not as high as might be de- 
sired, but are perhaps sufficiently high for use 
in evaluating the self-administering tests. 


b. Time Requirements of the Tests 

From the number of minutes recorded by 
each pupil on his test paper, frequency dis- 
tributions were prepared of the numbers of 
minutes required by the individual pupils to 
write each of the tests. A summary of these 
distributions is presented in Table IV. The 
“estimated optimum administration time” re- 
ported for each of the tests is the time re- 
quired by eighty-four per cent of the pupils 
to complete the test when administered under 
work-limit conditions. This, according to the 
findings of Cook (1), is presumably the time 
limit under which the test would produce the 
maximum validity when administered under 
time-limit conditions. While it has not been 
conclusively established that the time re- 
quired for completion by eighty-four per cent 
(or any other per cent) of the pupils is the 
true optimum administration time for all 
types of tests in any field, it perhaps repre- 
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sents as good a basis as any other for com- 
paring the tests as to time requirements. 
From the summary given in Table IV it 
may be seen that the time required for admin- 
istration differed considerably from one type 
of test to another. The time requirements for 
Form A, for example, are, in general, less 
than those for the other tests. Forms C and 
P, both of which make use of a separate 
answer sheet, require considerably more time 
than do Forms A and B, in which the re- 
sponses are made in the test copy itself. 


c. Reliability of the Capitalization Tests 

Reliability coefficients for the various cap- 
italization tests were estimated in each of the 
experiments in the following manner. After 
the within groups intercorrelation was com- 
puted between the scores on the two equiv- 
alent sections of each test, the Spearman- 
Brown prophecy formula was applied to esti- 
mate the reliability of the total test. These 
estimated reliability coefficients are reported 
in Table V. 

The scores obtained on the experimental 
tests were “corrected” in the same manner as 
were those for the criterion tests. It may be 


TABLE IV 


SUMMARY OF THE DISTRIBUTIONS OF THE TIME REQUIRED BY VARIOUS PERCENTAGES OF THE 
PUPILS TO COMPLETE EACH OF THE CAPITALIZATION TESTS 


Experi- Estimated* 
ment Test Number of Minutes Optimum 

Form Number Number 25% 50% 15% 95% Time 
A ; 1 1 6.76 7.83 9.19 10.97 9.94 
2 5.72 7.00 8.69 11.88 9.89 

B — 1 1 6.88 8.52 9.82 12.83 10.57 
2 5.86 6.83 8.59 12.21 10.07 

A. a 2 1 6.44 7.53 8.88 11.67 9.63 
‘ 2 5.60 6.89 8.25 11.48 9.67 

e 2 1 10.55 12.08 14.24 18.65 15.05 
2 9.11 10.81 12.86 15.68 13.90 

A j 3 1 5.77 7.10 8.16 11.17 9.00 
2 5.83 6.89 8.29 11.19 8.86 

DD 3 1 9.90 11.71 18.88 16.95 14.83 
2 8.31 10.17 11.75 14.64 12.64 

1D pandesttecenas 4 1 12.60 14.76 17.50 20.93 18.59 
2 10.39 12.10 14.50 19.08 15.33 

E 4 1 7.50 8.83 10.60 13.65 11.52 
2 9.71 12.18 14.82 18.90 16.39 

A ae ' 5 1 5.14 6.46 7.77 9.87 8.37 
2 5.03 6.29 8.07 11.78 9.07 

ee ee eee 5 1 7.50 9.19 11.31 15.41 12.47 
2 7.14 8.81 10.60 13.79 13.90 

a ae ee 6 1 5.34 635 7.47 9.78 7.99 
2 4.80 5.61 659 8.34 7.06 

P ‘cutetiealmenaias . 6 1 10.05 11.56 13.48 16.25 14.62 
2 9.01 10.40 11.95 14.94 13.11 


* The number of minutes required for eighty-four per cent of the pupils to complete the test. 
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TABLE V 
ESTIMATED RELIABILITY COEFFICIENTS (WITHIN GROUPS) FOR THE CAPITALIZATION TESTS 


IN EACH OF THE EXPERIMENTS 
Experiment Test No. 1 Test No. 2 
Form Number US* CS** US* cs** 

A ps ee 1 .83 82 82 82 
2 =  piaienenininiioamaekee 1 77 .76 .68 .72 
4 cn ee 2 .80 .78 77 82 
€ ‘ — 2 82 .69 84 .82 
A oe: 3 91 .90 89 86 
D — 3 .79 .80 85 .84 
Cc rs Pets 4 84 .82 .68 .66 
E = 4 a 4 .78 .76 86 -76 
\ 5 89 .82 .78 73 
> me _ 5 .79 Bye .78 .68 
‘ is eee 6 .90 .88 .83 .85 
6 86 88 85 91 


* Uncorrected scores used. 
* Corrected scores used. 


noted that in most instances “correcting” the 
scores does not greatly affect their reliability. 
It may also be observed that the estimated 
reliability coefficients for Form A, which was 
used in five of the experiments, range in value 
from .77 to .gr in the various experiments 
when the uncorrected scores are used. This 
range in reliabilities for the same test is larger 
than may reasonably be attributed to fluctu- 
itions in random sampling, and suggests that 
the true reliability of any of these tests may 
differ from school to school. 

Even aside from this possibility, the reli- 
abilities reported in Table V are not strictly 
comparable from test to test, since the admin- 
istration time varies considerably for the 
different tests. Therefore, in order to increase 
the comparability of these reliability coeffi- 
cients, they were “corrected” for differences 


TABLE 


in administration time. This was done by 
estimating, by means of the Spearman— 
Brown prophecy formula, what the reliability 
would be for a similar test of such a length 
that, when administered under work-limit 
conditions, eighty-four per cent of the pupils 
would finish the test in ten minutes. Although 
this procedure is somewhat questionable, it 
is believed that the reliability coefficients 
corrected in this manner for differences in 
administration time are more comparable 
than the reliabilities estimated for the tests 
for unequal testing times. These “corrected” 
reliabilities are given in Table VI. 

Although the differences in most of the 
experiments are not large, the reliability co- 
efficients corrected for differences in admin- 
istration time favor Form A in all but one 
instance. The one exception found is in the 


VI 


RELIABILITY COEFFICIENTS OF THE CAPITALIZATION TESTS WHEN “CORRECTED” FOR 


DIFFERENCES IN ADM 


Experiment 
Forn umber 
A oe ee eee 1 
ee ee aE Se 1 
| eee eee 3 Oy See SS 2 
Cc Pe. Oe 2 
| (eee aes SA eS ee Oe 3 
| rE Sa ee ee 3 
| ERS Ss Yael a Ge 4 
a Lee aS ae 4 
| OE LEE ENE, 5 
ELE EDIT LAE GES 5 
| ee eS) See 6 
PRS EE aC ee 6 


* Uncorrected scores used. 
** Corrected scores used. 


INISTRATION TIME 


Test No. 1 Test No. 2 
US* CS** US* CS* 
.83 82 82 82 
.76 .75 .68 .72 
81 .79 .78 .82 
.75 59 -79 Be i 4 
.92 91 .90 87 
.72 .73 .82 81 
.74 71 58 55 
.76 .73 .79 .66 
91 .84 81 75 
.75 67 .72 .60 
.92 .90 .87 .89 
81 .83 81 88 
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case of the reliability of Test No. 2 in the 
second experiment, for the uncorrected scores. 
In three of the four comparisons in this ex- 
periment, however, the reliability of Form A 
is higher than that of Form C, the other test 
used. Because there is no test of significance 
of differences between reliabilities estimated 
in this fashion, and also because of the pos- 
sibility previously mentioned that the reli- 
ability differs from school to school, very little 
can be said about the relative reliability of 
the tests. The fact that in each of the experi- 
ments the estimated reliability of Form A is 
numerically greater than that of the other 
tests does suggest, however, that Form A is 
somewhat superior in this respect to the other 
types of capitalization tests used. 


d. Relative 
Tests 

In each experiment, the correlation within 
classes was computed between the scores on 
each test and on the corresponding parts of 
the criterion test. These correlations repre- 
sent the validity coefficients obtained for 
each of the tests when administered under 
work-limit conditions. An examination of 
Table VII reveals that the validity coefficients 
show considerable variability from one ex- 
periment to another, even for the same form. 
For Form A, which was used in five of the 
experiments, the coefficients obtained, using 
the uncorrected scores, range in value from 
.35 for Test No. 2 in the first experiment to 
.72 for Test No. 1 in the sixth experiment. 
This difference is greater than that between 
any two different types of tests, and is un- 
questionably greater than can be attributed 
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to chance fluctuations in random sampling 
Again, as in the case of the reliability anal- 
ysis, this suggests that the same test may have 
one degree of validity in one school and an- 
other degree of validity in another schoo), 
and that, therefore, any validity coefficient 
based upon the results from only a few 
schools may be very unstable. 


In view of these large differences between 
the validity coefficients obtained for Form A 
in the different experiments, the writer feels 
compelled to hazard some conjectures con- 
cerning the causes. In the first place, such 
differences might be at least partially caused 
by differences in the conditions under which 
the criterion and experimental tests were ad- 
ministered in the schools participating in the 
various experiments. To illustrate, the pupils 
in one experiment might possibly have been 
more highly motivated than were those tak- 
ing the tests in another experiment. Because 
special care was taken to prevent such differ- 
ences in administrative conditions, it does not 
seem reasonable to attribute the large differ- 
ences in validity coefficients to this factor 
alone. 


Secondly, differences in the previous train- 
ing of the pupils included in the different 
experiments could account for differences in 
the obtained validities. For example, it is 
possible that the pupils in some of the schools 
were accustomed to writing dictation tests or 
exercises in capitalization, while those in other 
schools were not. It does not seem unreason- 
able to assume that in some schools the use 
of dictation drills has been a part of the 
regular program in the English classes, and 


TABLE VII 


Tue VALiIpiTty COEFFICIENTS (WITHIN GROUPS) OBTAINED FOR THE EXPERIMENTAL 
TESTS IN CAPITALIZATION 


Experiment 
Form Number 
A vila 1 
B 1 
A 2 
Cc ‘ 2 
A 3 
D 3 
Cc 4 
E . PE er 4 
' ie ‘ sae 5 
i? = aa ae eens 5 
) ae : iaial anand 6 
P — 6 


** Corrected scores used. 





Test No. 1 Test No. 2 
Wet VWee*® N Tas® Ves** 
156 .45 ~~ .45 144 .85 ~~ 51 
144 .48 «451 156 .41 ~~] 
152 50.61 148 .57 ~~ .66 
148 .49 462 152 .48~—sSCO«S 
149 63 ~~ «67 146 66 .72 
146 .557 «58 149 .49 ~ «58 
148 .42~ «51 152 .59 ~~ «68 
152 .47 .62 148 .55 ~~ .58 
153 =.60~—Ss(iG 149 .48~ «61 
149 89 ~~ «55 153. ATCC 
151 .72 ~~ «68 145 .67 ~ «62 
145 68 .7!i1 151 .52  .56 
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that such drills were never or only very in- 
frequently used in other schools. Such differ- 
ences in the experiences and previous train- 
ing of the pupils in different schools would 
result in differences in their performance on 
the criterion measure employed in this inves- 
tigation. It may also have been that the 
pupils in some schools were accustomed to 
writing various types of self-administering 
tests in capitalization, while those in other 
schools were not. Such differences in the ex- 
periences of the pupils could also contribute 
to differences in the validity of the same test 
from school to school. 

If it is true that there is a real difference 
in the true validity of a given capitalization 
test from one school to another, the differ- 
ences observed in the validity coefficients 
obtained for the same test from one experi- 
ment to another in this study can be explained 
quite readily. In one experiment involving 
only two or three schools, for example, one 
might by chance get schools in all of which 
the test would show a low validity coefficient. 
Also, by chance the few schools in a second 
experiment might, conversely, be schools in 
all of which the test would have a relatively 
high validity. It is apparent that the validity 
coefficient obtained in either of these experi- 
ments would not be a satisfactory description 
of the general validity of the test. 


Because the time requirements differ from 
one test to another, the validity coefficients 
reported in Table VII are rendered still fur- 
ther incomparable for the various types of 
tests. Lindquist and Cook (4, p. 167) have 
pointed out that “. . . the tests should be 
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compared in terms of validity for a standard 
period of time, when each test is administered 
at its own optimum rate.” To make the val- 
idities of the different tests more comparable 
with reference to this factor, they were “cor- 
rected” for differences in the time require- 
ments. 

Using the time required for eighty-four per 
cent of the pupils to complete the test as the 
basis for the correction, coefficients obtained 
for the several tests were “corrected” for dif- 
ferences in administration time by means of 
the following formula provided by Kelley 
(3, p. 200) for the correlation between a cri- 
terion and the sum or average of a number 
of equally weighted scores. 


Vv’ : ~ ———— 
I—Tr, 
/ —n Os 

In the formula, V is the validity of the test 
in its original length; r, is the reliability of 
the test in its original length; and N equals 
the proportionate increase in number of items, 
and hence presumably in the time required 
for eighty-four per cent of the pupils to 
finish. 

In using the above formula, the assump- 
tion is made that the validity coefficients thus 
estimated would equal those obtained if the 
amount of equivalent material included in 
each test were such that each test would re- 
quire the same amount of time (ten minutes) 
for eighty-four per cent of the pupils to finish 
the test when administered under work-limit 
conditions. It should also be recognized that 
if the tests were actually administered under 


V 
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TABLE VIII 


VALIDITY COEFFICIENTS “CORRECTED” FOR DIFFERENCES IN ADMINISTRATION TIME FOR 
EACH OF THE CAPITALIZATION TESTS 
Experiment Test No. 1 Test No. 2 
Form Number N Ves* Ves** N Vus* Ves** 

A 1 156 .46 ~~ 45 144 .85 ~~ «51 
B 1 144 .48 «451 156s 51 
\ aa 2 152 .50~ 3 61 148 .57 ~ .66 
C AREER 2 148 .47 ~~ «57 152 .47 «61 
A RPE oa 3 149 .63 ~~ .67 146 66 = .73 
D 3 146.54 .55 149 48 .5T7 
C ath 4 4 148 4.389 ~=«48 152 .54 ~~ .62 
AE? 4 ie At «fe 148 .538 ~~ .54 
A A reli lie. 5 153 SC. 61 DT 149 48 62 
-_ aR ST, Re WO 5 149 .388 ~~ .53 153 .45 = .52 
TD  waciipiaeenigigteimcks aiace 6 151 .73 ~~ .69 145 68 ~~ «68 
© sae ie i Ree 6 145 .66 «469 151 51.55 


* Uncorrected scores used. 
** Corrected scores used. 
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time-limit conditions, the resulting validities 
(because of probable difference in the mental 
set of the pupils) might be different. 

In all but one of the five experiments in 
which Form A was used, the “corrected” 
validity coefficient for Form A is higher than 
that of the other test used. This one excep- 
tion is found in the first experiment where 
Form A has its lowest validity coefficient. In 
this experiment, the “corrected” validity co- 
efficient for Form B is slightly higher than 
that for Form A. Because of the inconsist- 
encies even in the same experiment, since 
there is no test of significance of differences 
between validity coefficients estimated in this 
manner, and also because of the possibility 
previously suggested that the validity of a 
test differs from school to school, little can be 
said about the relative validity of the tests 
on the basis of the findings reported in Table 
VIII. 

It was nevertheless desired to obtain some 
generalized description of the validity of the 
different types of capitalization tests evalu- 
ated in this study. Therefore, the validity co- 
efficients were averaged for each of the forms, 
using the procedure given by Fisher (2, pp. 
207-208) for combining correlation values 
from independent samplés. Since this method 
of averaging correlation value is not valid if 
true differences in correlation exist from one 
school to another, the use of this procedure 
may be questionable in this case. Neverthe- 
less, it is believed that the averages thus ob- 
tained represent perhaps the best possible 
estimates of the general validity of each of 
the different tests. 

Because the average validities reported in 
Table IX are only estimates whose precision 
is not known, the superiority of one test over 
another cannot be established conclusively. 
Therefore, it may be only very tentatively 


TABLE IX 


Tue “CORRECTED” VALIDITY COEFFICIENTS 
AVERAGED FOR EACH OF THE 
CAPITALIZATION TESTS 


Form N US* Ccs** 
A ee, 58 .62 
aaa 44 51 
— ee 588 46 54 
7 aS eee 295 51 56 
eae ee 590 46 53 
gS See ee 296 59 62 


* Uncorrected scores used. 
** Corrected scores used. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No I 


concluded that Forms A and P are superior 
to the other types of capitalization tests 
evaluated. 


e. Results of the Item Analysis 


A final evaluation may be made of the 
different types of capitalization tests from an 
item analysis of the responses to the indi- 
vidual situations in each of the tests. For 
each of the tests the tetrachoric correlation 
coefficient between the responses to each of 
the situations in the criterion test and in the 
experimental test was determined. Then for 
each form a distribution was prepared of thes: 
tetrachoric r’s, and the twenty-fifth, fiftieth 
and _ seventy-fifth percentiles computed 
Table X presents these quartile values for 
each of the tests in each of the six experi- 
ments. 


TABLE X 


THE VALIDITY OF THE INDIVIDUAL SITUATIONS 
IN EACH OF THE CAPITALIZATION TESTS 


Experi- 
ment Tetrachoric r 
Form Number N* Qa; Q: 

A . ey 73 44 83 .22 
B = a 60 A ae 
A a 2 82 52 40 27 
3 ee 2 102 50 37 .25 
A ide 98 64 .52 .40 
D ae 1 ae 100 41 .80 16 
e am 84 51 38 .23 
E wm 85 51 .41 .29 
i seamen. «ee 84 51 .41. «3! 
E aa ~~ 89 .45 .80 .20 
A a 88 55 .49 .39 
P sa 69 60 .50 .42 


_* The number of the 136 planted capitaliza- 
tion situations for which the tetrachoric r was 
determinate. 


In each of the tests there were some situ- 
ations for which the value of the tetrachoric 
r was indeterminate. For those situations for 
which the coefficient was determinate, the 
median value gives an approximate of the 
average or typical validity of the individua’ 
situations in the test. The results reported in 
Table X again emphasize the variability in 
the relative effectiveness of Form A from one 
experiment to another. Although the value of 
the median coefficient for Form A is not con- 
stant for each of the experiments, it shows 
Form A to be somewhat superior in this re- 
spect in most of the experiments. One of the 
two exceptions is in the first experiment where 
the median validity of the situations in Form 
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A was found to be the lowest of that for any 
of the experiments. In the sixth experiment, 
the values obtained for Form P are slightly 
higher than those for Form A. On the basis 
of these results, no conclusive generalizations 
can be made. 


>». ResuLTs CONCERNING THE PUNC- 
TUATION TESTS 


a. Reliability of the Criterion Tests 

In each experiment, the reliability of the 
criterion test was estimated in the same way 
is was that for the criterion test in capitaliz- 
ation. These estimated reliabilities are re- 
ported in Table XI. Each of the scores on 
the test was “corrected” by subtracting the 
number of incorrect responses to the situa- 
tions not requiring certain punctuation from 
the number of correct responses to the situ- 
ations which did require punctuation. It may 
be observed that “correcting” the scores in 
this way had little effect upon their estimated 
reliability. These reliabilities, though not as 
high as might be desired, are probably suffi- 
ciently high for use in evaluating the experi- 
mental tests. 


TABLE XI 


ESTIMATED RELIABILITY (WITHIN GROUPS) OF 
4 COMBINATION OF TWo SECTIONS OF THE 
CRITERION TEST IN PUNCTUATION IN EACH 
OF THE EIGHT EXPERIMENTS 


Experiment 
Number US* CS** 
7 Si cee aes .89 
Dt oe ee ee .90 
a) Radiata ic deceg lat paisa se ean 91 
10 wa lps ia cate lia simcadeaies ame 91 
11 = poibna een ee 94 
12 SSRI NTRS CREE PETER: 93 .92 
| Se ae ee 92 
fn eee om Meee er ren eee 92 .92 


Uneorrected scores used. 
* Corrected scores used. 


b. Time Requirements of the Tests 

From the distributions of the numbers of 
minutes required for the individual pupils to 
complete the tests, the number of minutes re- 
quired for various percentages of the pupils 
to complete each of the tests was determined. 
A summary of these results is given in Table 
XII. These results show that the amount of 
time required for a given percentage of the 
pupils to complete a punctuation test varies 
with the type of test. It is evident that the 
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number of minutes required to write an objec- 
tive test in punctuation depends upon the 
manner in which the pupils are required to 
make their responses. It may also be noted 
that the techniques requiring the use of a 
separate answer sheet require a relatively 
large amount of time. Another observation of 
interest is that the time requirements for 
Form F are considerably larger in two of the 
experiments (the thirteenth and fourteenth) 
than they are in the others. The basis for 
estimating the optimum time limits was de- 
scribed in connection with the time require- 
ments of the capitalization tests. 


c. Reliability of the Punctuation Tests 


The reliability of each of the punctuation 
tests was estimated by means of the 
Spearman—brown prophecy formula from the 
intercorrelation computed between the scores 
on the two sections comprising each of the 
tests. These estimated reliabilities are given 
in Table XIII. 


The scores on each of the experimental 
tests were “corrected” in the same manner as 
were those on the criterion test. It may be 
observed that these corrections had little 
effect upon the reliability of the obtained 
scores. When the uncorrected scores are used, 
the estimated reliability of Form F varies 
from .88 for Test No. 2 in the seventh ex- 
periment to .94 for Test No. 1 in the thir- 
teenth experiment. This range is larger than 
may reasonably be attributed to fluctuations 
in random sampling and suggests the possi- 
bility that the true reliability of any of these 
tests may differ from school to school. These 
results tend to agree in this respect with those 
found for the capitalization tests. 


Even apart from the possibility that the 
true reliability of the tests may differ from 
school to school, the reliabilities reported in 
Table XIII are not comparable from test to 
test, since the administration time varies con- 
siderably for the various tests. The reliabil- 
ities for the different tests were, therefore, 
“corrected” for differences in administration 
time. The method of correction employed was 
the same as that used in “correcting” the re- 
liabilities of the capitalization tests for differ- 
ences in administration time. These “cor- 
rected” reliability coefficients are given in 
Table XIV. 
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TABLE XII 
SUMMARY OF THE DISTRIBUTIONS OF THE TIME REQUIRED BY VARIOUS PERCENTAGES OF THE 
PUPILS TO COMPLETE EACH OF THE PUNCTUATION TESTS 
Experi- Number of Minutes Estimateds# 
ment Test Optimum 


Form Number Number 25% 50% 75% 95% Time 








7.14 8.96 10.55 14.350 11.46 








P 7 1 
2 7.26 661 10.27 12.65 11.12 
G 7 1 8.24 9.63 11.70 14.27 12.44 
2 8.09 9-58 11-46 15.31 12.66 ; 
F 8 1 8.38 10.21 11.56 14.£0 11.92 ' 
2 8.10 10.)2 112.68 15.25 13.23 
I 8 1 11-25 13.08 16.75 20.45 17.94 
2 13.15 15.60 18.C6 20.67 18.76 
PF G 1 6.59 9.94 10.74 Gea 121.07 
2 8.37 9674 10.64 15.07 11.64 
J 9 1 15.]£ 15.98 20.10 20.82 20.43 
2 15.01 15.90 18.35 20.73 20.15 
I 10 1 13.12 14.91 17.42 22.57 18.71 
2 10.04 11.17 14.56 20.78 18.52 
I 10 1 15.20 16-72 16625 24.95 21.66 
2 14.06 16.00 19.15 23.35 19.92 
Pr 12 1 7.17 8.30 9.99 13.5€e 10.63 
2 8.24 9673 11.29 15.15 11.91 
K 11 1 11.21 12.67 14.63 18.72 15.38 
2 2.78 14626 1€.C5 20.39 16.94 
F 12 1 027 «= 7067) =69e TS) — 1358 10.94 
2 6.37 7.61 9.35 11.69 7.90 
L 12 - 1 19629 23.38 27.57 31.78 29.36 
2 20.328 24.10 27.32 32.92 29.02 
F 13 1 10.09 11664 15.42 20.67 16.94 
2 9.37 11.17 13.64 18.00 14.96 , 
M 13 1 15.04 17.57 20.59 25.25 22.07 ; 
2 13371 15.94 18.80 22.67 19.60 
FP 14 1 8.57 10.42 13.46 17.90 14.78 f 
2 8.68 10.74 13.19 17.34 14.40 
N 14 1 19.63 23.12 26.83 31.63 28.66 
2 19.10 22.32 25.55 31.27 27.44 





* The number of minutes required for eighty-four per cent of the pupils to finish the test. 














September, 1939] 


CAPITALIZATION AND PUNCTUATION 73 


TABLE XIII 


ESTIMATED RELIABILITY COEFFICIENTS (WITHIN GROUPS) FOR THE PUNCTUATION TESTS 
IN EACH OF THE EXPERIMENTS 


Experiment Test No. 1 Test No. 2 
Form Number US* Ccs** US* CcSs** 
F : i 91 88 88 .89 
C 5 Se 7 91 88 88 87 
FE ead Smit 8 90 89 95 94 
! 4 8 90 89 89 .86 
F 9 90 89 92 90 
J D As 9 57 55 Ry | 44 
I i ae 10 78 76 .90 88 
J f 10 80 77 81 80 
| ner 11 92 94 .93 93 
K eee a 11 84 89 .92 91 
I 12 90 88 .93 92 
| ae ee ee mee ar Seen eee oa) 12 85 86 87 88 
I ‘a 13 94 94 91 92 
M " ‘ 13 7 86 .80 88 
F - nate: ea) .- en e 14 91 90 89 88 
N RO AE Spe OR 14 vx | 80 .84 84 
* Uncorrected scores used. 
** Corrected scores used. 
TABLE XIV 


RELIABILITY COEFFICIENTS OF THE PUNCTUATION TESTS WHEN “CORRECTED” FOR 
DIFFERENCES IN ADMINISTRATION TIME 


Experiment Test No. 1 Test No. 2 
Forn Number US* CS** US* CS** 
F J 2 7 .90 86 .87 .88 
G * ios 7 .89 85 85 84 
F : 8 .88 87 94 92 
I ss a 8 83 82 81 .77 
I Bade 9 89 88 91 .89 
J ‘ = 9 39 oO A0 .28 
I 10 .65 .63 .83 .80 
J 10 65 61 .68 .67 
I 11 92 94 92 .92 
K eels. 11 mn 84 87 .86 
12 89 .87 94 94 
| eee 12 66 .68 69 71 
i 13 .90 .90 .87 .89 
M ‘ 13 15 18 67 .79 
I Seer one : 14 87 86 85 83 
N Se. ee ie 14 54 58 65 .65 


* Uncorrected scores used. 
** Corrected scores used. 


lhe reliability coefficients as corrected for 
differences in administration time represent 
estimates of the reliabilities that would be 
obtained when the amount of material in 
each is such that it would require ten minutes 
lor eighty-four per cent of the pupils to com- 
plete each test when administered under 
work-limit conditions. It may be noted that 
the reliability of Form F is, in general, some- 
what higher than that of the other forms. In 
some instances this superiority is not marked. 
For Forms J, L, and N, all of which employ 
4 separate answer sheet, the reliabilities are 


somewhat lower than those for the other 
forms. Because there is no test of significance 
of differences between reliabilities estimated 
in this manner, and also because of the pos- 
sibility already suggested that the reliability 
differs from school to school, little can be 
said about the relative reliability of the dif- 
ferent tests. The fact, however, that the esti- 
mated reliability of Form F is consistently 
higher than that of the other forms supports 
the tentative conclusion that it is superior in 
this respect to the other tests evaluated. 
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TABLE XV 
Tue VALIDITY COEFFICIENTS (WITHIN GROUPS) OBTAINED FOR THE EXPERIMENTAL 
TESTS IN PUNCTUATION 
Experiment Test No. 1 Test No. 2 
Form Number N West Ves** N Vas* Ve 3°" 

F 7 154 .66 ~~ = .70 154 .78 ~~ .75 
G 7 154 .53 .52 154 .66 «47: 
Fr s 140 .78 ~ = .79 135 ~=s« 81 83 
I 8 135 .65 3.69 140 ~—.58 59 
Fr 9 Ret CUl«iceetti‘éwttS 150 ~=—.69 72 
J 9 150s 1 61 157 59 57 
I 10 108 «4.5561 124 60 60 
J 10 124 59 ~=6.59 108 60 62 
F 11 149 .76 ~~ = «.78 157 83 84 
K 1! 157 .64 .73 149 59 ( 

i 12 134 By (5 81 124 84 84 
L 12 124 + .52 67 134 48 
F 13 140 .78 ~~ .80 140 82 R92 
M 13 140 =©.60 ~~ 65 140 48 i] 
I 14 146 .75 7 6 CiCSTKCCTT 
N 14 146 .49  .62 146 =«.53 67 


* Uncorrected scores used. 
* Corrected scores used. 


d. Relative Validity of the Punctuation Tests 

Validity coefficients were determined for 
each of the tests administered under work- 
limit conditions in each of the experiments by 
computing the correlation between the scores 
on the experimental test and those on the 
corresponding part of the criterion test. The 
validity coefficients obtained in this manner 
are reported in Table XV. 

It may be observed that the validity ob- 
tained for Form F, for example, varies in 
value from .66 for Test No. 1 in the seventh 
experiment to .84 for Test No. 2 in the 
twelfth experiment, when the uncorrected 
scores are used. This range is greater than 
can reasonably be attributed to chance fluc- 
tuations in random sampling. As in the case 
of the capitalization tests, this again suggests 
the possibility that the true validity of a given 
test may differ from one school to another, 
and that, therefore, a validity coefficient 
obtained from the results in only a few schools 
may be very unstable. 

To make the validity coefficients reported 
in Table XV more comparable from test to 
test, they were “corrected” for differences in 
administration time. The method of correc- 
tion used was the same as that employed for 
“correcting” the validity coefficients obtained 
for the capitalization tests. These “corrected” 
validity coefficients are reported in Table 
XVI. 

It may be observed from the validity co- 
efficients presented in Table XVI that in 


each of the seven experiments in which Forn 


F was used, the “corrected” validity coeiii- 


cient for Form F is higher than that of the 


other test used. It may also be noted that the 


“corrected” validity coefficients for Forms | 
L, and N, all of which employ a separate 
answer sheet, are relatively low. Becaus 
there is no test of significance of differences 
between validity coefficients estimated in this 
fashion, and also because of the possibilit) 
already mentioned that the validity of a test 
differs from one school to another, it is no! 
possible to establish conclusively the super- 
ority of any one of these tests. However, th 
fact that the “corrected” validity of Form F 
is consistently higher than that of the othe 
tests supports the tentative conclusion thi 
Form F is probably superior in validity ' 
the other types of tests used in this study. 
In order to obtain at least an approximat 
description of the general validity of each 0! 
the different types of punctuation test use! 
in this study, the validity coefficients wer 
averaged for each of the forms. The proce- 
dure used was that employed in averaging 
the validities of the various capitalization 
tests. Again it should be emphasized that be- 
cause of the possibility that the validity 0! 
any test may differ from one school to an- 
other, this procedure may be somewhat ques 
tionable. But it perhaps gives as good a 
estimate of the general validity of a test 
would any other method. The average valid- 
ities thus obtained are given in Table XVI! 
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TABLE XVI 


VALIDITY COEFFICIENTS “CORRECTED” FOR DIFFERENCES IN ADMINISTRATION TIME FOR 
EACH OF THE PUNCTUATION TESTS 


Experiment 


Form Number 


ZS Be ey ey Se et Set ay Ot oy oy 


* Uncorrected scores used. 
** Corrected scores used. 


TABLE XVII 


THe “CORRECTED” VALIDITY COEFFICIENTS 
AVERAGED FOR EACH OF THE 
PUNCTUATION TESTS 


Form N US* CS** 
F wc:nisteeecieme 17 .78 
Er ee SS 308 59 .63 
I Snsndeenieniar ae 56 58 
J sank macetata: ae 52 51 
K sc heieidineaecialed 306 .60 .67 
L ee 44 .b4 
M iacepnacaact ae 50 .54 
N a : . 292 44 .56 


* Uncorrected scores used. 
** Corrected scores used. 


Since there is no test of significance of 
differences between the averages of validity 
coefficients “corrected” by means of the 
method used in this study, it cannot be estab- 
lished conclusively that one test is superior 
to the others. It should be emphasized, how- 
ever, that neither does the evidence warrant 
the conclusion that all of the tests are equally 
effective. It may be observed that the average 
validity reported in Table XVII for Form F 
is distinctly higher than that of any of the 
other tests. While the evidence is not conclu- 
sive, if any one type of punctuation test were 
to be recommended on the basis of the re- 
sults obtained from this study, it would 
undoubtedly be Form F. 


é. Results of the Item Analysis 

The final comparison of the different types 
of punctuation tests used in this study was 
made with respect to the validity of the in- 


Test No. 1 Test No. 2 
Vus* Ves** N Vus* Veo** 
154 .66 .69 154 le .75 
154 53 51 154 65 By p 
140 Pe is | .78 135 .80 .82 
135 .63 .66 140 55 56 
157 77 .76 150 69 .71 
150 1 50 157 0 .50 
108 .50  .55 124 .58 .57 
124 53 02 108 .5D .56 
149 .76 a 157 83 .84 
157 .62 Be i 149 57 .63 
134 By fs; 81 124 84 85 
124 .46 09 134 A438 50 
140 .76 .78 140 80 .80 
140 56 .60 140 44 48 
146 .74 .76 146 By f 15 
146 41 00 146 AT 59 
dividual situations included in the tests. 


From an item analysis of the responses to 
each of these situations, the tetrachoric cor- 
relation coefficient between the responses to 
each of the situations in the criterion test and 
in each of the experimental tests was deter- 
mined. A distribution of these tetrachoric 
r’s was then prepared for each of the tests. 
For each distribution, the twenty-fifth, fifti- 
eth, and seventy-fifth percentiles were com- 
puted. These quartile values are presented in 
Table XVIII for each of the tests in the 
several independent experiments. 


TABLE XVIII 


THE VALIDITY OF THE INDIVIDUAL SITUATIONS 
IN EACH OF THE PUNCTUATION TESTS 


I-xperi- 


ment Tetrachoric r 

Form Number N* Q: Q: Q: 
ee sec hcg 7 123 55 .47 .88 
_ eee 4 108 46 .83 .18 
Pee 8 144 64 .54 .48 
as 8 146 49 .85 .19 
(aaa 9 136 52 .42 .30 
aD Snaps 9 140 329 .29 .18 
. seasnspaisase 10 144 53 .86 .20 
OP eee 10 140 50 .386 .21 
— ee 11 139 63 .52 .40 
Nia aati < OE 129 58 .42 .80 
Pe eRe 12 135 57 .47 .34 
Di: Nic eichcenelaies 12 123 45 .85 .22 
— = 13 137 64 .57 .46 
aE daminicumiiaen 13 110 52 .40 .30 
i. ssemeatad 14 131 55 .47 .84 
OS sccm 14 106 40 .30 .20 


*The number of the 168 planted punctua- 
tion situations for which the tetrachoric r was 
determinate. 
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Since, in each of the tests, there were some 
punctuation situations for which the tetra- 
choric r was indeterminate, the values given 
in the above table are based on only those 
situations for which the coefficient was deter- 
minate. It may be seen that the median value 
of the tetrachoric r for the situations when 
presented in Form F is in each experiment 
somewhat higher than that of the other test 
used. In some instances this advantage is 
relatively small. However, the fact that the 
observed values consistently favor Form F 
makes it appear probable that Form F does 
have some real superiority in this respect. 


3. CONCLUSIONS 


a. Conclusions Regarding the General 
Procedure 

In the writer’s opinion, the most significant 
outcomes of this investigation are its impli- 
cations for future research. That is, the study 
draws sharp attention to the difficulties that 
will be met by an investigator who attempts 
to determine the relative effectiveness of 
different types of tests. 


The reader is reminded that most of the 
comparisons of the validity and reliability of 
the tests evaluated in this study are admit- 
tedly inconclusive. This was found to be true, 
even in spite of the fact that it is believed 
that the design of the investigation was an 
improvement over the typical procedures used 
in such studies of the past. 

It will be recalled that a special effort was 
made, first of all, to incorporate in the tests 
important writing situations in which it is 
known that pupils tend to make frequent or 
crucial errors in capitalization and punctua- 
tion. Secondly, each of the tests evaluated 
includes exactly the same content as the cri- 
terion test. In the third place, the tests were 
administered to a relatively large number of 
pupils. A fourth precaution was the special 
care taken to secure as nearly as possible uni- 
form conditions of administration of the tests 
throughout the several experiments. A fifth 
precaution involved the steps taken to mini- 
mize the practice effect of writing the same 
test more than once, and to equalize the effect 
of the order in which the tests were written. 
Keeping careful records of the amount of time 
required for each pupil to complete each of 
the tests that he wrote constituted a sixth 
precaution. Finally, it was recognized that 
the samples used could not be considered as 
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random samples, and appropriate statistical 
procedures were employed in computing the 
various correlations. 

Because of these precautions, many of 
which constitute improvements over typical 
procedures followed in the past, it is believed 
that this investigation should have been as 
adequate as any previous study and perhaps 
even represent the best work done to date in 
attempting to evaluate tests in this field, 
However, in spite of all of these precautions, 
the results are inconclusive. It becomes inm- 
portant, therefore, to examine the evidence 
carefully to try to discover reasons for the 
inconclusiveness of the results. 

The most important reason for the incon- 
clusiveness of the results appears to be the 
fact that the validity and reliability of the 
same test did not remain constant from school 
to school. It will be recalled that in this study 
differences greater than can reasonably be 
attributed to fluctuations in random sampling 
were observed between the validity coeffi- 
cients obtained for the same test in different 
experiments. This was also found to be true 
of the estimated reliability of the same test 
from one experiment to another. If this evi- 
dence had been available previously, the 
writer obviously would have been compelled 
to make important changes in the design of 
this investigation. In view of these wide dif- 
ferences found for the same test, it was im- 
possible to secure a meaningful description o/ 
the validity or the reliability of any one type 
of test on the basis of these results. 

Another factor contributing to the incon- 
clusiveness of the results is the variability 
obtained in the time requirements for the 
same test from one experiment to another. In 
view of this variability, the small number of 
schools involved in any one experiment did 
not provide a sufficient basis for a meaningful 
description of the proper time requirements 
for a given test for schools in general. 

The inconclusiveness of the results obtained 
in this investigation indicates certain prob- 
lems that must be met by future investigators 
who attempt to determine the relative effec: 
tiveness of different types of tests. One highly 
important problem is that of obtaining 3 
meaningful description of the general validity 
and reliability of a test. It seems probable 
that the true validity and the true reliability 
of the same test differ from one school t 
another. If this is true, a meaningful descrip- 
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tion of the validity or the reliability of a test 
cannot be obtained from a sample consisting 
of only a few schools. By chance, each of the 
schools included in the sample might be 
schools in which the test has a relatively low 
validity or reliability. In another sample, the 
test might show a high value for each of these 
measures in each of the few schools. A de- 
scription of the validity or the reliability of 
the test on the basis of the results obtained 
from either of these samples would not be 
very meaningful. 

It is obviously important, therefore, that in 
any study dealing with the evaluation of 
tests, the sample should include a sufficiently 
large number of schools to render stable the 
validity and reliability coefficients obtained 
for each of the tests. The adequacy of the 
sample must be determined, not in terms of 
the number of pupils included, but with re- 
spect to the number of schools involved. An 
even better procedure might be to determine 
the validity and reliability of each test for 
each school separately and arrange such values 
in frequency distributions to show the range 
that might be expected for each of the tests 
with reference to these factors. The compari- 
sons of the various tests would then be based 
upon these frequency distributions. Since this 
procedure means that the school and not the 
pupil must be considered as the unit in each 
sample, future investigators must be prepared 
to design their experiments upon a much 
wider scale, particularly with reference to the 
number of schools involved, than has been 
the typical practice in the past. 


Another important problem that must be 
met in future investigations dealing with the 
evaluation of different types of tests is that 
of determining the relative effectiveness of 
the tests per unit of time. The importance of 
this factor can perhaps best be made clear in 
terms of an illustrative problem. Suppose, for 
example, that we wish to construct a test of 
basic language skills for use in a wide-scale 
testing program. Suppose, further, that we 
have only ten minutes to allot to that part of 
the test devoted to the measurement of cap- 
italization ability. After the skills to be tested 
have been selected, we are confronted with a 
two-fold problem. One aspect of this problem 
is that of selecting the type of test to use; 
the other is that of determining how much 
material to include in the test if it is to have 
its optimum effectiveness when administered 
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in ten minutes. Since the optimum amount of 
material for the same amount of testing time 
may differ from one type of test to another, 
the choice of the type of test to use is par- 
tially dependent upon the answer to the sec- 
ond part of the problem. That is, other factors 
being constant, we will probably select the 
type of test that will permit us to test the 
widest sampling of skills. This means that we 
must first determine the optimum amount of 
material to include in each test for a time 
limit of ten minutes. 

No very satisfactory technique is available 
as yet for determining the optimum rate of 
administration of a test. Probably the most 
satisfactory procedure in this instance would 
be to administer different lengths of each 
test with the same time limit (ten minutes) 
for each and then compute, for each of the 
tests, measures of validity and reliability for 
each length used. In this way, we can deter- 
mine which of the different amounts of mate- 
rials used in this tryout is most effective for 
each of the tests. Further refinements would 
require additional tryouts. When the opti- 
mum amount of material for the time limit 
of ten minutes has been determined for each 
of the tests, we are ready to make an evalu- 
ation of the different types of tests in a final 
experiment. From the results of this final ex- 
periment, we can then select the technique to 
use in the capitalization section in our test 
of basic language skills. 

The results obtained from this investigation 
draw attention to one factor in particular 
that would further complicate the problem 
just illustrated. Since it was found that the 
time requirements for the same test varied 
considerably from one experiment to another, 
it seems probable that the optimum rate of 
administration differs from school to school 
for the same test. If this is true, the amount 
of material to be included in a test for a given 
length of time should be determined from a 
sample including a sufficiently large number 
of schools to render stable the results ob- 
tained. Again, the number of schools and not 
the number of pupils must be considered in 
determining the size of the sample needed. 
This means that a relatively large number of 
schools should be included in the preliminary 
tryouts and in the final experiment. 

It may even be preferable to determine the 
optimum rate of administration for each 
school separately and then prepare frequency 








distributions of these results for each of the 
different tests. The amount of material for a 
given time for schools in general would then 
be determined from these distributions. For 
an investigation of this nature to yield con- 
clusive results, it must be planned and con- 
ducted on a much wider scale than have the 
typical studies of the past. This will obvi- 
ously involve considerable cost in terms of 
time, money, and effort. 

In summary, the writer wishes to repeat 
that from the results obtained in this investi- 
gation, it seems probable that the validity, 
reliability, and time required for administra- 
tion of the same test may differ systematically 
from school to school. He wishes also to re- 
emphasize that in future investigations deal- 
ing with the evaluation of different types of 
tests, the school and not the pupil must be 
considered as the unit in determining the size 
of the sample needed to yield conclusive 
results. 


b. Conclusions Regarding the Capitalization 
Tests 

Subject to the limitations of this study and 

pending further investigation, the following 

tentative conclusions concerning the capitali- 
zation tests are offered on the basis of the 
results obtained. 

1. The results of this study are such that 
it is impossible to recommend any one 
type of capitalization test over any other 
with any high degree of confidence, but 
if forced to recommend any one type, 
the author would recommend first Form 
A and then Form P. In Form A the 
sentences are presented with all capitals 
omitted. The pupil must locate the 
words needing capitalization and supply 
the missing capitals. In Form P, cer- 
tain of the words are numbered and the 
pupil must decide which should be cap- 
italized and which should not. 

. Seventh and eighth grade pupils appear 
to have relatively little difficulty in 
handling the sheer mechanics of the 
kinds of separate answer sheets used 
with certain of the capitalization tests 
evaluated in this study. However, it re- 
quired about twenty-five per cent more 
time on the average to write the same 
capitalization test when the responses 
were made on a separate answer sheet 
than when they were made on the test 


page itself. 


te 
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3. Correcting the scores on a capitalization 
test by subtracting the number of incor- 
rect responses to the situations not re- 
quiring capitals from the number of cor- 
rect responses to the situations requiring 
capitals tends to raise somewhat the val- 
idity coefficient obtained. 


c. Conclusions Regarding the Punctuation 
Tests 
Subject to the limitations of this study and 
pending further investigation, the following 
tentative conclusions concerning the punctu- 
ation tests evaluated are offered on the basis 
of the results obtained. 

1. The results of this study are such that 
it is impossible to recommend any one 
type of punctuation test over any other 
with any high degree of confidence, but 
if forced to recommend one type, the 
author would recommend Form F first 
and then Form K. In Fornr F the sen- 
tences are presented with all punctuation 
omitted. The pupil must locate the situ- 
ations needing punctuation and supply 
the missing punctuation marks. In 
Form K, three punctuation choices are 
given at various places in the sentences, 
and the pupil must underline the correct 
choice at each place. 

2. Punctuation tests which employ a sepa- 
rate answer sheet appear to be relatively 
low in effectiveness with respect to both 
reliability and validity (per unit of 
time). 

3. Seventh and eighth grade pupils appear 
to be able to handle the sheer mechan- 
ics of the kinds of separate answer sheets 
used with certain of the punctuation 
tests used in this study. However, it re- 
quired about twenty per cent more time 
on the average to write the same punc- 
tuation test when the responses were 
made on a separate answer sheet than 
when they were made on the test page 
itself. 

4. Correcting the scores on a punctuation 
test by subtracting the number of incor- 
rect responses to the situations not re- 
quiring punctuation from the number of 
correct responses to the situations re- 
quiring punctuation tends, in general, to 
increase somewhat the validity coeff- 
cient obtained for the test. 
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rHE EFFECT OF PRACTICE UPON 
THE VALIDITY OF A TEST IN 
PUNCTUATION 


1. THE PROBLEM 


\s the several types of objective techniques 
‘or the measurement of ability in capitali- 
zation and in punctuation were being evalu- 
ited, it was suggested that the true validity 
{ a technique might change as the pupils 
became more familiar with it. It was thought 
that when the pupil is confronted with a new 
r unfamiliar type of test, in a large measure 
what is being measured may be not only the 
pupil's ability to capitalize or to punctuate, 
but also his ability to handle the testing tech- 
ique itself. 

It was the purpose of this experiment to 
letermine what effect, if any, practice (to in- 

ise familiarity with the form of the test) 

s upon the validity of an objective test in 

nctuation. 


THe EXPERIMENTAL PROCEDURE 


lhe same criterion test employed in the 

evious experiments was used in this study. 
[he experimental test used was Form O, de- 
cribed previously. The test is divided into 
‘wo parts as were the other experimental tests. 

lhree tests, each as long as one section of 
the dictation test, were constructed for prac- 
ice tests.* These tests resemble Form O and 
ontain sentences and punctuation situations 
similar to those in the criterion test. The di- 
rections for these practice tests are similar to 
those for Form QO. 

The criterion test was administered to over 
‘wo hundred pupils in the seventh and eighth 
trades in one intermediate school. The pro- 
edure of administration was the same as that 
‘ollowed in the previous experiments. Form 
(}-1 was administered a week later to the 
same group of pupils. After one day, the 
three practice tests were given to these pupils, 
ne test on each of the next three schooi days. 
Sefore and after each practice test was admin- 
stered, questions concerning the method of 
making responses were encouraged and 
answered. 

On the day following the administration of 
(he third practice test, and without previous 
warning to the pupils, Form O—2 was admin- 


* Copies of these practice tests and of Forms O-1 and O-2 
nay be found in Appendix C of the complete study on file 
y ie College of Education Library of the State University 

wa 
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istered to the same pupils. The pupils were 
directed to record the number of minutes re- 
quired to write the test. As in the other ex- 
periments, uncorrected and corrected scores 
were established for each pupil on the cri- 
terion test and on each of the experimental 
tests. There were 149 pupils who were present 
for each of the seven testing periods. 


3. STATISTICAL ANALYSIS OF THE RESULTS 


The reliability of the criterion test and of 
each of the two experimental tests was com- 
puted from the intercorrelations obtained be- 
tween the scores on the equivalent sections of 
the tests. This was done using first the un- 
corrected and then the corrected scores. The 
scores were “corrected” in the same fashion 
as were those on the punctuation tests used 
in the experiments previously described. 

Validity coefficients were obtained for each 
of the tests by computing the correlation 
within classes between the scores on the ex- 
perimental tests and those on corresponding 
parts of the dictation test. The validity co- 
efficients were then “corrected” for differences 
in administration time using the same proce- 
dure employed in the previous experiments. 

Correlations were also computed between 
the scores on the two experimental tests. 

The tetrachoric correlation coefficient be- 
tween the responses to each situation in the 
criterion test and in the experimental tests 
was determined for each situation. Frequency 
distributions were then prepared of the tetra- 
choric r’s for the situations in each of the two 
experimental tests. The quartile values were 
computed for each of the distributions. 


Finally, distributions were prepared of the 
numbers of minutes required by the individual 
pupils to write each of the experimental tests. 
From each of these distributions, the number 
of minutes required for various percentages of 
the pupils to complete the tests was deter- 
mined. 


4. THE EXPERIMENTAL FINDINGS 


For the first experimental test, the cri- 
terion used was the first two sections of the 
dictation test. The last two sections served as 
the criterion for the second experimental test. 
The reliability of the criterion was estimated 
by means of the Spearman—Brown prophecy 
formula from the average of the six within 
groups intercorrelations computed between 
the four approximately equivalent sections of 
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So 
TABLE XIX 
SUMMARY OF THE SCORES ON FORMS O-1 AND O-2 
Uncorrected Corrected 
Scores Scores 
Test A.M. S.D. A.M. S.D. 
Form O-1 __ 53.90 8.60 44.68 9.67 (before practice) 
Form O-2 57.95 9.57 50.05 10.14 (after practice) 
TABLE XX 


SUMMARY OF THE 


DISTRIBUTIONS OF THE TIME REQUIRED BY VARIOUS PERCENTAGES OF TH; 
PUPILS TO COMPLETE FORMS O-1 AND O-2 


Estimated* 
Number of Minutes Optimum 
Test 25% 50% T5% 9% Time 
Form O-1 12.58 14.25 17.15 20.51 19.43 (before practice) 
7.42 837 9.51 11.39 9.97 (after practice) 


Form O-2 


* The number of minutes required for eighty-four per cent of the pupils to complete the test. 


the dictation test. The estimated reliability 
was found to be .92 for both the uncorrected 
and the corrected scores. 

The means and standard deviations of the 
scores on Forms O-1 and O-2 are given in 
the following table. Form O-1 is the test 
given before and Form O-—2 that given after 
the practice tests were administered. 

The scores on the experimental tests were 
“corrected” in the same manner as were those 
on the criterion test. The mean scores were 
somewhat higher after practice than they were 
before practice. Either the pupils’ ability to 
punctuate improved, or the pupils improved 
in their ability to handle the testing technique. 

The correlations obtained between the 
scores (before and after practice) on the ex- 
perimental tests were found to be .84 and .76 
respectively when the uncorrected and the 
corrected scores were used. 

Table XX presents a summary of the 
amount of time required by the pupils to 
write each of the experimental tests. It may 
be seen that after practice the pupils required 
less time to write the test than they did before 
this practice was given. 

Reliability coefficients estimated for the 
two experimental tests are reported in Table 
XXI. It may be observed that the reliability 
of the test given after the practice is higher 
than that of the test given before the prac- 
tice, both when the corrected and when the 
uncorrected scores are used. 

The reliability coefficients given in Table 
XXI are those estimated from the scores 
when the tests were administered under work- 
limit conditions. To make the reliabilities for 


TABLE XXI 


ESTIMATED RELIABILITY COEFFICIENTS FOR 
Forms O-1 AND O-2 


Test US* CS** 
Form O-1 __---- .82 .82 (before practice) 
Form O-2 ___--- 91 .95 (after practice) 


* Uncorrected scores used. 
** Corrected scores used. 


the two tests more comparable, they were 
“corrected” for differences in administration 
time. This was done by estimating, by means 
of the Spearman-—Brown prophecy formula, 
what the reliability would be for a similar 
test of such a length that when administered 
under work-limit conditions, eighty-four per 
cent of the pupils would finish the test in ten 
minutes. Table XXII gives the reliability 
coefficients for each of the tests when “cor- 
rected” for differences in administration time. 


TABLE XXII 
RELIABILITY COEFFICIENTS FOR ForMS O-1 AND 


O-2 WHEN “CORRECTED” FOR DIFFERENCES 
IN ADMINISTRATION TIME 


Test ve TG" 
Form O-1 -_---- -72 _—_.72 (before practice) 
Form Q-2 __-_-_ 91 .95 (after practice) 


* Uncorrected scores used. 
** Corrected scores used. 


It will be noted that the difference betwee? 
the reliabilities of the two tests is relatively 
large both for the uncorrected and for the 
corrected scores. These differences are u0- 
questionably greater than can reasonably be 
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attributed to chance. This may be interpreted 
as indicating that increased familiarity with 
the technique resulted in an increased reli- 
ability of the scores. 

Validity coefficients obtained for the tests 
when administered under work-limit condi- 
tions are given in Table XXIII. It may be 
observed that the coefficients for the two tests 
are identical for the uncorrected scores and 
that they differ by only .o4 when the cor- 
rected scores are used. 


TABLE XXIII 


VALIDITY COEFFICIENTS OBTAINED FOR 
Forms O-1 AND O-2 


Test US* CS** 
Form O-1 —_---~ .53 .65 (before practice) 
Form O-2..... .68 .61 (after practice) 


* Uncorrected scores used. 
* Corrected scores used. 


lo make these validity coefficients more 

nparable, they were “corrected” for differ- 
ences in administration time by means of the 

ethod used in the experiments previously de- 
scribed. This was done in order to determine 
the validity of each test when the amount of 
similar material is such that eighty-four per 
cent of the pupils would finish the test in ten 
minutes when administered under work-limit 
onditions. It will be noted that the differ- 
ences between these “corrected” validities for 
the two tests is small. 

On the basis of these results, and pending 
further investigation, the hypothesis may be 
retained that increased familiarity with a 
testing technique does not significantly affect 
the validity of the technique (per unit of 


TABLE XXIV 


Tus VALIDITY COEFFICIENTS FOR ForMs O-1 
AND O-2 WHEN “CORRECTED” FOR DIF- 
FERENCES IN ADMINISTRATION TIME 


Test US* CS** 
Form O-1 _..._- 49 .60 (before practice) 
Form 0-2 -_____ 53 .61 (after practice) 


* Uncorrected scores used. 
Corrected scores used. 


Comparison of the validities of the two 
tests with respect to the individual punctu- 
ation situations may be made from the results 
given in Table XXV. It should be pointed 
out that for each of the two tests, the tetra- 
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choric correlation for about one-fourth of the 
situations was indeterminate. For those situ- 
ations for which the tetrachoric r was deter- 
minate, the median validity is just slightly 
higher in Form O-—2 than it is in Form O-r. 
Comparisons of other quartile values reveal 
about the same small differences. 


TABLE XXV 


VALIDITY OF THE INDIVIDUAL SITUATIONS IN 
ForM O-1 AND IN ForM O-2 


Form O-1 Form O-2 
(before practice) (after practice) 


TS* SR** SN*** TS* SR** SN*** 
Q.-.. .44 45 28 41 .42 35 
Q..... 28 30 20 388 34 30 
Q----. 15 615 14x66 iia‘ Cd 


* Total situations. 
** Situations requiring punctuation. 
*** Situations not requiring punctuation. 


5. CONCLUSIONS 


From the results obtained, and subject to 
the limitations of this experiment, the follow- 
ing conclusions seem warranted. 


1. Increased familiarity with an objective 
technique for the measurement of ability 
to punctuate tends to increase the reli- 
ability of the test employing the 
technique. 

. Increased familiarity with an objective 
technique for the measurement of abil- 
ity in punctuation reduces considerably 
the amount of time required to write a 
test of a given length. 

3. In spite of its marked effect on the reli- 
ability of the scores, practice does not 
seem to affect the validity per unit of 
time appreciably. However, since the 
results are not conclusive, the question 
must be left open to future investigation. 


to 


PROOF 
AN 


THE IMPORTANCE OF THE 

READING FACTOR IN 

OBJECTIVE TEST IN 
PUNCTUATION 


1. THE PROBLEM 


Many of the objective punctuation tests 
used have been of the so-called proof reading 
type. In a proof reading test, printed material 
containing punctuation situations is presented 
to the pupil who is directed to supply needed 
punctuation where it has been omitted, cross 
out unnecessary punctuation marks, and cor- 





rect any incorrect punctuation. Two types of 
situations may be presented in such a proof 
reading test—those requiring punctuation, 
and those not requiring punctuation but in 
which pupils do frequently tend to supply 
punctuation. Each of these may be presented 
either in correct or in incorrect form. 

It is possible that pupils tend to accept as 
correct the punctuation they see in print. If 
this is true, a pupil may make more errors in 
a proof reading test than he would in his own 
writing. And his score on a given proof read- 
ing test would depend not only upon his 
actual punctuation ability, but also upon his 
“proof reading ability” and upon the relative 
number of situations that were presented in 
correct and in incorrect form. 

It was the purpose of this experiment to 
attempt to answer the following questions: 
(1) What is the relative difficulty of punctu- 
ation situations when presented in correct 
form and when presented in incorrect form 
in a proof reading test? (2) What is the rela- 
tionship between the number of correct re- 
sponses made by the individual pupils to cer- 
tain punctuation situations when presented in 
correct form and to the same situations when 
presented in incorrect form? 


THe EXPERIMENTAL PROCEDURE 


a. Construction of the Tests 

The total number of punctuation skills 
which may be tested is exceedingly large. 
Therefore, since it was thought desirable to 
include in the tests several illustrations of 
each kind of situation, it was decided to limit 
the study to a few of the more or less com- 
mon punctuation skills. Three apostrophe and 
six comma chile were selected for use in the 
tests. 

The first step in the construction of the 
test was to build sentences including applica- 
tions of each of these punctuation skills. Six 
illustrations of each of the nine different kinds 
of punctuation skills were included in the 
test. Thirty sentences were constructed, 
cluding a total of fifty-four planted punctua- 
tion situations. An effort was made to make 
the vocabulary and structure of the sentences 
appropriate for the seventh and eighth grade 
levels. 

The sentences were then arranged in more 
or less random order and the fifty-four situ- 
ations numbered consecutively. Two copies of 
the sentences were then prepared. In the first 
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copy, a random selection of approximately 
one-half of the situations were punctuated 
correctly. These same situations were punc- 
tuated incorrectly in the other copy. The re- 
maining situations were punctuated incor- 
rectly in the first copy and correctly in the 
second copy. The first and second copies 
were then labeled Test I and Test I 
respectively. 

Each of the two tests thus resembles the 
usual type of proof reading test. Each test 
contains exactly the same sentences and the 
same number and kind of punctuation situa- 
tions. Both of the tests were given the same 
title pages with exactly the same directions on 
each. The two tests differ only in that each 
situation is punctuated correctly in one test 
and incorrectly in the other. In each test 
approximately half of the situations are pre- 
sented in correct form and the others in 
incorrect form.* 


Administering and Scoring the Tests 


Test I and Test II were written by 4 
seventh and eighth grade pupils of two public 
schools in Iowa under the same conditions, 
but a week apart. This number includes all 
enrolled in these grades excepting those who 
were absent on one or both of the testing 
days. No warning was given to the teachers 
and the pupils that either of the tests was to 
be given. Carefully prepared directions were 
discussed thoroughly with the examiner be- 
fore the tests were administered. 

Each of the tests was scored for the re- 
sponses to the fifty-four planted punctuation 
situations using stencil-type keys. For each 
pupil, the response to each situation was re- 
corded as right or wrong. A separate record 
was kept of the responses to the situations 
when presented in correct and in incorrect 
form. 


3. THE EXPERIMENTAL RESULTS 


a. The Results on the Two Tests 


To determine whether or not there was any 
practice effect evident in the scores on Test 
II, the total number of correct and incorrect 
responses was determined for each of the two 
tests. These results are given in Table X XVI 

The results indicate that the pupils per- 
formed no better on Test II than they did on 


* A copy of each of these tests will be found in Appendix C 
= the —_ st on file in the College of Education 
State University of Iowa. 
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TABLE XXVI 


SUMMARY OF THE RESPONSES TO ALL OF THE SITUATIONS IN EACH OF THE Two TESTS 


Correct 
Responses 
12,211 
: 12,149 


I. It was assumed, therefore, that there 
was little, if any, practice effect. 


Relative Difficulty of Two Forms of Pres- 
entation of the Same Punctuation Situ- 
ations 

Since one of the purposes of the study is to 

determine the relative difficulty of the same 
punctuation situations when presented in cor- 
rect and in incorrect form, an analysis was 
made of the errors made by the pupils on 
each of the two forms of presentation. The 
results of this analysis are given in Table 
XXVII for each of the nine different punctu- 
ition rules involved. 


TABLE XXVII 


SUMMARY OF THE Errors MADE ON Two 
FORMS OF PRESENTATION OF THE SAME 
PUNCTUATION SITUATIONS 


Errors 
on the 
Incorrect 
Form 


888 


Errors 
on the 
Total Correct 
Errors Form 
933 45 
128 


—— 


G 
H 
I 


233 
So 9657 2127 


Table XXVII may be read in the following 
manner. The 315 pupils made 45 errors on 
the situations illustrating Rule A, for example, 
when they were presented in correct form, 
and 888 on the same situations presented in 
incorrect form. The results for each of the 
other rules may be interpreted in the same 
manaer. It may be noted that for each of 
these rules the pupils made more errors on 
the incorrect form of the situations than they 
did on the correct form. 

_ There were 7530 errors resulting from the 
failure of the pupils to change situations pre- 
sented in incorrect form and only 2127 errors 
irom making changes in situations already 


Per Cent 
of Total 
28.15 


28.57 


Incorrect 
Responses 
4789 
4861 


Per Cent 
of Total 


71.78 
71.42 


correct. In other words, the pupils made more 
than three times as many errors by accepting 
situations presented in incorrect form as they 
did by changing correct situations. These re- 
sults indicate that punctuation situations pre- 
sented in incorrect form in a proof reading 
test are relatively more difficult than the same 
situations presented in correct form. It seems 
reasonable, therefore, that pupils have a tend- 
ency to accept as correct the punctuation they 
see in printed materials. 

Pupils who have a thorough mastery of the 
skill involved in handling a given punctuation 
situation should not be affected by the form 
in which the situation is presented. That is, 
their responses should be consistently correct. 
Similarly, pupils who have a definite miscon- 
ception of the correct punctuation for a given 
situation should consistently respond #ncor- 
rectly to that situation, irrespective of the 
form in which it is presented. Since the form 
of the presentation presumably should not 
affect the accuracy of the responses of either 
of these classes of pupils, their responses were 
eliminated in an attempt to secure a purer 
measure of the relative difficulty of the two 
forms of presentation. 

Therefore, the inconsistent responses to the 
two forms of presentation were analyzed. The 
responses to the same situation when pre- 
sented in correct and in incorrect form may 
be inconsistent in either of two ways: (1) the 
pupil may change the correct form and in- 
consistently change also the incorrect form, 
or (2) he may accept the correct form and 
inconsistently also accept the incorrect form. 

Since there are six situations illustrating 
each rule, the 315 pupils made 1890 pairs of 
responses to the situations for each rule. A 
pair of responses indicates one response to the 
correct form and another to the incorrect 
form of the same situation. Table XXVIII 
presents the results of this analysis for each 
of the nine rules. 

It will be noted that thirty-eight per cent 
of the total number of pairs of responses were 
of the inconsistent type. Of these, three per 
cent were answered incorrectly on the correct 





54 
TABLE XXVIII 


THE PERCENTAGE OF THE PAIRS OF RESPONSES 
THAT WERE INCONSISTENT* 


Rule Total Typei** Type 2*** 
ss ae 1 46 
B . aa 50 2 48 
Cc . 52 4 48 
BD ns Siababioaaee ce 0 3 
E Sandan Sa 1 52 
Fr peccioaiiics ae 3 23 
G es. 5 37 
H Saal 35 6 29 
. an pean ee 7 29 
Total __ . 38 3 35 


ercentages of 


* Figures given are rounded 
rule, and of 


1890 pairs of responses for eac 
17,010 pairs for the total. 
** Response wrong on 
right on incorrect. 
*** Response right on 
wrong on incorrect. 


correct form and 


correct form and 


form and correctly on the incorrect form. 
Thirty-five per cent were responded to cor- 
rectly on the correct form and incorrectly on 
the incorrect form. Thus, there is a ratio of 
3 to 35 between the number of pairs of re- 
sponses in which the pupils were in error on 
the correct form only and those in which they 
made errors only on the incorrect form of 
presentation. This is additional evidence that 
pupils tend to accept as correct the punctua- 
tion they see in print, and that the same situ- 
ations presented in incorrect form are rela- 
tively more difficult than when presented in 
correct form. 

Still another picture of the relative diffi- 
culty of the same situations presented in the 
two forms may be obtained from the distri- 
butions of the number of correct responses 
made by each pupil to the two forms of 
presentation. 


TABLE XXIX 


SUMMARY OF THE DISTRIBUTIONS OF THE NUM- 
BER OF CORRECT RESPONSES TO THE Two 
FORMS OF PRESENTATION OF THE SAME 
PUNCTUATION SITUATIONS 


Number of Number of 
Rights on Rights on 
Percentile Correct Form Incorrect Form 

Sa 52.23 41.09 
aera 50.57 36.34 
. SO 49.20 33.19 
ae 48.19 30.69 
ae caine 47.32 28.43 
. 2 eee 46.53 26.12 
aaa 45.74 23.79 
rere 44.37 21.69 
a 42.12 17.72 
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Table XXIX shows that the median num- 
ber of correct responses to the fifty-four situ- 
ations when presented in correct form was 
47.32. The median number of correct re- 
sponses to the same situations presented jn 
incorrect form was only 28.43. Other per- 
centile values may be interpreted similarly 
It may be noted that the ninetieth percentik 
of the scores on the incorrect form is below 
the tenth percentile of those on the correct 
form. The mean of the distribution of the 
number of correct responses to the situations 
presented in correct form was found to by 
47.28. The mean of the distribution of num- 
ber of correct responses to the situations pre- 
sented in incorrect form was found to be only 


29.94. 


c. Relation Between the Scores on Tx 
Forms of Presentation of the Sam 
Punctuation Situations 

The second purpose of the study is to 
determine the relationship between the num- 
ber of correct responses made by the indi- 
vidual pupils to the same punctuation situa- 
tions when presented in correct and in incor- 
rect form. Accordingly, the correlation was 
computed between the number of correct 
responses on each of the two forms of pres- 
entation. This was done for each group of 
six situations illustrating each of the nine 
punctuation rules involved and also for the 
total of fifty-four situations included in the 
test. The correlation coefficients thus obtained 
are given in Table XXX. Each of the cor- 
relations is based upon 315 cases. 


It will be observed that the magnitude 0! 
the correlation coefficients reported in Table 
XXX _ varies considerably from one type 0! 
situation to another. However, the correlation 


TABLE XXX 


CORRELATION COEFFICIENTS OBTAINED BETWEEN 
THE SCORES ON TWo FORMS OF PRESENTA- 
TION OF THE SAME PUNCTUATION SITUATIONS 
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between the number of correct responses to 
:// of the situations when presented in correct 
and in incorrect form was found to be only 
3. This indicates that for the total group of 
fifty-four punctuation situations, there was 
little if any relationship between the number 
f correct responses to the situations when 
presented in correct form and when presented 
in incorrect form. 


Percentage of Errors Resulting From Lack 

of Proof Reading Ability 

It seems reasonable to assume that in a 

roof reading test in punctuation, the incon- 

tency of the responses to the same situa- 
tions presented in two different forms must be 
the result of one or both of two factors, 
chance and lack of proof reading ability. 
Pupils would not intentionally change both 
the correct and the incorrect forms, nor would 
they intentionally accept both forms. But 
even though pupils certainly would not inten- 
tionally be inconsistent, chance would operate 
to result in some inconsistent responses. 

It will be recalled that three per cent of 
the pairs of responses were caused by chang- 
ing both the correct and the incorrect forms 
of presentation (see Table XXVIII). It 
seems reasonable to assume that these three 
per cent could not have been caused by a lack 
of proof reading ability, because the pupil 
made a conscious, overt response to each situ- 
ation on both forms of presentation. They 
could not indicate mastery of the punctuation 
skills involved, for one of the responses in 
each pair was incorrect. Nor could they have 
resulted from a definite misconception of the 
correct punctuation needed, because one of 
the responses in each pair was correct. Such 
inconsistent pairs of responses must be due 
either to a lack of knowledge concerning the 
correct use of these skills or to some variable 
extraneous factor. In either case, they were 
due to chance. 

_ It was shown in Table XXVIII that thirty- 
live per cent of the inconsistent pairs of 
responses to the same situations presented in 
two forms were the result of accepting the 
situations as they were printed and making 
no changes. If chance was responsible for 
three per cent of the pairs of responses in one 
category, it seems reasonable to believe that 
chance would not be responsible for more 
than three per cent in the other. If this as- 
sumption is valid, then thirty-two per cent 
(thirty-five per cent minus three per cent) of 
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the inconsistent pairs may be said to have 
been caused by factors other than chance. It 
seems probable that a lack of proof reading 
ability is the principal one of these factors. 

If we assume that thirty-two per cent of 
the pairs of responses were inconsistent be- 
cause of a lack of proof reading ability, then 
approximately fifty-six per cent of all of the 
errors can be attributed to faulty proof read- 
ing.” These errors were the result of the fail- 
ure of the pupils to change a situation pre- 
sented in incorrect form. At the same time, 
they left the situation as it appeared when 
presented in correct form. It seems reasonable 
that these errors were not made because of 
wrong habits of punctuation, but more prob- 
ably because of the failure of the pupil to 
note the error which he might have been able 
to correct had his attention been drawn to it, 
or which he probably would not have made in 
his own writing. In other words, these errors 
were probably the result of a lack of proof 
reading ability. 


e. Summary of the Results 

It was found that the 315 pupils made 
7530 errors on the punctuation situations 
presented in incorrect form and only 2127 on 
the same situations presented in correct form. 


It was found that in three per cent of the 
pairs of responses to the same situations the 
response to the correct form was wrong and 
that to the incorrect form was right. In 
thirty-five per cent of the pairs of responses 
to the same situations, the response to the 
correct form was right and that to the in- 
correct form was wrong. 


The mean number of correct responses to 
the situations presented in correct form was 
found to be 47.28; the mean number of cor- 
rect responses to the same situations presented 
in incorrect form was found to be only 29.94. 
The medians were found to be 47.32 and 
28.43 respectively. 

For the situations representing the nine 
different punctuation rules involved, the cor- 
relations between the number of correct re- 
sponses to the situations presented in correct 
form and those to the same situations 
presented in incorrect form ranged in value 

*Solution: Thirty-two per cent of 17,010 equals roughly 
5443 pairs of responses. In each of these pairs, an error was 
made only when the situation was presen in incorrect form, 
making a total of 5443 errors. Dividing 5443 by 9657 (the 
total number of errors), we find that these a per cent 


of the pairs of res contain about fifty per cent of 
the total number of errors. 
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from .or to .67. For the entire test, this 

correlation was found to be only .o8. 

Of the total number of errors made by all 
of the pupils on all of the situations presented 
in both forms, fifty-six per cent were attrib- 
uted to faulty proof reading. 

4. CONCLUSIONS 

On the basis of the above findings, and 
subject to the limitations of this study, the 
following conclusions seem relevant. 

1. In a typical proof reading test in punc- 

tuation, the same situations are rela- 

tively more difficult when presented in 
incorrect form than when presented in 
correct form. 

2. There is little, if any, relationship be- 
tween the number of correct responses 
made by the pupils to the same situa- 
tions when presented in correct and in 
incorrect form in a typical proof reading 
test in punctuation. 

. Approximately half of the errors on a 
typical proof reading test in punctuation 
can probably be attributed to a lack of 
proof reading ability on the part of the 


pupils. 


w 


. Kelley, T. L. 


[Vol. 8, No. ; 


BIBLIOGRAPHY 


. Cook, W. W. The Measurement of Gen- 


eral Spelling Ability, Involving Controlled 
Comparisons Between Techniques. Uni- 
versity of Iowa Studies in Education, \o! 
VI, Number 6. Iowa City, Iowa: Univer- 
sity of Iowa, 1932. 


. Fisher, R. A. Statistical Methods for Re- 


search Workers. Sixth Edition. Edinburgh 
Oliver and Boyd, 1936. 

Statistical Method. Ney 
York: The Macmillan Company, 1924 


. Lindquist, E. F., and Cook, W. W. °Eyx- 


perimental Procedures in Test Evalua 
tion,” Journal of Experimental Education, 
I (March, 1933), 163-85. 


. Powell, R. L. Valid Testing and Diagnosi; 


in the Mechanics of Ninth Grade English 
Composition. Unpublished Ph. D. Thesis 
University of Iowa, 1932. 


. Willing, Matthew H. Valid Diagnosis in 


High School Composition. Teachers Col- 
lege Contributions to Education, No. 230 
New York City: Columbia University, 
1926. 


eee | 


' 
q 
: 














$ 
' 
: 


on wade, ee « 


ee 


1 eg he Saas Ide aii 








THE DEVELOPMENT OF A SPELLING TEST FOR USE 
IN SECOND GRADE 
ALBERT GRANT 


Psychological Laboratory, Public Schools 
Cincinnati, Ohio 


This report deals with an attempt to ap- 
praise the outcomes of instruction in spelling 
in grade II in Cincinnati. A fifty-word spell- 
ing test was constructed for this purpose. 
lhe procedures used in constructing the test, 
as well as the results of the test for over four 
thousand second-grade pupils, are described 
in the discussion which follows. Suggestions 
indicating how the test may be used by teach- 
ers elsewhere are also given. 

How the Test Was Constructed.—In select- 
ing items for the test the major aims of spell- 
ing instruction were used as criteria. These 
aims are generally conceded to be twofold: 
(1) meeting the pupils’ spelling needs as these 
occur in their written work, and (2) develop- 
ing in pupils a mastery of some of the basic 
words which it is important for children to 
know how to spell. It was possible to select 
words consistent with the first of these cri- 
teria through information supplied by second- 
grade teachers. Each teacher sent to the cen- 
tral office a list of the words which her pupils 
had needed frequently in their written work 
during the first three months of the school 
year. These lists were tabulated to secure a 
master list of all the different words. The 
number of times each word occurred was also 
determined so that the words could be ar- 
ranged in order of frequency. 

To meet the second criterion, the master 
list was reduced to include only words which, 
according to various investigations, it is im- 
portant for pupils to know how to spell. 
Coleman’s' summary of investigations was 
used in this connection; all words not appear- 
ing in his list were eliminated. Fifty words 
were then selected from the resulting master 
list for use in the test. With a few exceptions, 
the words which occurred most frequently in 
the teachers’ lists were chosen. The particular 
words selected appear in Table II. The final 
step in constructing the test consisted in pre- 
paring fifty simple sentences, each illustrating 
one of the words. 


‘William H. Coleman, A Critique of Spellin 


past. Vocabulary 
mvesttgations. Greeley: Colorado State Teachers 


college, 1931. 


8 


Results of the Test in Grade 11.—The test 
was administered on a city-wide basis early 
in May, 1938. After being scored by the 
teachers, the test papers were returned to the 
central office for analysis. Results of this 
analysis are given in Tables I and II. Table I 
gives a frequency distribution of the scores 
(number of words spelled correctly) made by 
the 4520 second grade pupils tested. Table II 
shows the per cent of pupils in a sampling of 
525 who spelled each word correctly. For ex- 
ample, 27.4 per cent spelled the word “found” 
correctly, whereas 81.3 per cent spelled the 
word “book” correctly. On the average the 
words were spelled correctly by about one- 
half of the group. 


Possible Future Use of the Test.—It is 
hoped that the test described will be found 
useful by teachers elsewhere as a basis for 
appraising the results of instruction in spell- 
ing in second grade. The data given in 
Tables I and II may be used as a basis for 
interpreting pupils’ scores. 


The test may be used as a whole or in part. 
The following statements are suggested for 
use in interpreting scores where the test is 
used as a whole. 


1. Pupils who spell 28 of the 50 words 
correctly are achieving on a par with the 
median for second grade pupils in Cin- 
cinnati near the end of the school year. 

2. Pupils who spell 12 of the 50 words 
correctly are achieving on a par with the 
25th percentile for second grade pupils 
in Cincinnati near the end of the school 
year. 

3. Pupils who spell 44 of the words cor- 
rectly are achieving on a par with the 
75th percentile for second grade pupils 
in Cincinnati near the end of the school 
year. 


2A similar test for use in grade III was also developed in 
connection with the investigation reported here. Information 
concerning it can be secured from the author. 
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TABLE ] TABLE II 
Scores oF 4520 SecoND GRADE PUPILS ON THE PeR CENT OF CORRECT SPELLINGS IN SAMPL)> 


CINCINNATI SPELLING TEST FOR 
Grave II, May, 1938 


(No. of Word Number 
Correct of Pupils 
4g - 50 653 
45 = 47 34s 
uo = 4h 295 
39 = 4 218 
% - % 219 
33 = 3H 186 
3 = 3 216 
7-2 200 
be 2 207 
21 = 23 205 
18 - 2% 188 
15 - 17 225 
l2 = 14 222 
j- hl 2ko 
6-8 2h7 
3-5 290 
0-2 361 
Total 4520 
25th Percentile 11.9 
Median 28.1 
75th Percentile 43.7 


Teachers who desire to use only some of 
the 50 words will find the data in Table II 
helpful in selecting the words and in inter- 
preting the results. At least 20 te 25 words 
should be given to secure a reasonably reli- 
able result. As far as possible, words of the 
same or nearly the same difficulty should be 
chosen. Pupils’ scores may be interpreted in 
terms of the difficulty of the words selected. 
The average difficulty of the words used may 
be regarded as representative of the average 
achievement which may be expected of pupils 
near the end of the year. An illustration will 
clarify the procedure to be followed. Let us 
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PupPILs, MAY, 1938 
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clean 30.4 
round 2 
window 

want 


help 
brother 37.) 
lost 38, 
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ll. seven 

12. glad 

13. sister 0 
14. water 4.0 
15. rain bo. 
16. walk 42.0 
17. made 
18. wind 
19. from 
20. our 
21. read 
22. vring 
2% ride 
2h. tola 
25. room 
26. saw 
27. down 
28. new 
29. with 
. love 
31. they 
32. fall 
33. green 
34. two 
35. stop 
30. bird 
37- your 
38. father 
39. dvaby 
uO. milk 
ul. girl 
42. come 
43. like 69.0 
bu. play 70.1 
45. little 74.0 
46. cow 74.7 
47. car 76.8 
48. mother 77.0 
49. tree 80.1 
50. dvook 81.3 
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Average for All Words 51.4 


suppose that a teacher selects from Table I! 
the 25 words numbered 9—33. Using the per 
cent of pupils found able to spell these words 
as measures of difficulty, the average difficulty 
of the 25 words would be 45.3% or approx- 
imately 45%. Thus pupils who spelled 11 of 
the 25 words would be achieving on a par with 
the average for second grade pupils in Cin- 
cinnati near the end of the year. 
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AND DEVELOPMENT OF THE STUDY 


number of questions arise whenever 
is consideration of the matter of attain- 
ent of vocabulary on the part of students, 
ling college students. Do they show ade- 
or significant improvement in under- 
ing the meaning and correct use of words 
luring their studentship? Do they learn the 
technical terms peculiar to the subjects they 
studied? Do they have greater facility 
the use of common or nontechnical words 
ifter a period of schooling than they had 
before? 
This study was undertaken in an attempt 
» find the answers to these questions and, 
incidentally, to discover whether certain types 
f procedure with selected groups on the col- 
lege level would result in gains in any way 
commensurate with the effort involved in 
perating the procedures. 
Several studies have been made in the field 
f vocabulary. Some have been made in lim- 
ited portions of the field, such as the deter- 
mination of words necessary in the mastery 
of content, or with a view to determining 
the proper vocabulary content in the writing 
{ new textbooks. Some have studied mastery 
alone, while others have tried to discover the 
amount of retention as well. Some of the 
earlier studies were made in connection with 
the building of vocabulary tests, others with 
the purpose of building word tests appropriate 
to definite groups such as grades, subjects, 
and so forth, and yet others for the discovery 
of the actual working vocabularies of such 
groups. A distinction sometimes has been 
made between quantitative and qualitative 
vocabularies, the one referring to mere num- 
vers of words, the other to the type of word 
in terms of its fundamental nature and appro- 
priateness to the situation in which it is used. 


This article is a condensation of a Doctor’s dissertation 
(1936) by the author, a complete egy of which is on file in 
the office of Dr. D. A. Worcester, irman of the Depart- 
ment { Educational Psychology and Measurements, at the 

niversity of Nebraska. Anyone interested in examining the 


me Ey detail may consult the manuscript or write to the 
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In the matter of quantitative vocabularies, 
the findings of different investigators have 
been very diverse. For instance, one group’ of 
investigators found the range of words for 
two-year-olds to be from 200 to 1ooo, while 
another group found it to be from 69 to 677. 
Gerlach (18) reports the average vocabulary 
at five years to be 6837 words. English labor- 
ers have been reported’ as able to carry on a 
conversation with a vocabulary of only 100 
words. Baird (4) counted 650 words among 
tourists. This of course could not be a very 
adequate test, even of the oral or speaking 
vocabulary, because of the short period of 
observation and the relatively few topics to 
come up spontaneously for discussion out of 
the many with which at least certain ones of 
the tourists would be familiar. A New York 
business paper® claims that a man in business 
dealings needs 3500 words. It would appear 
to be contingent upon the kind of business. 
In some of the earlier studies clear distinc- 
tions were not made between the various types 
of vocabulary such‘as oral or speaking, writ- 
ing, reading, and understanding; nor between 
the results obtained by different types of tests 
such as recognition, recall, and so forth. 

In order to find the number of words 
known, various devices or methods have been 
used. With very young children ordinarily a 
careful record is kept of the words actually 
used in a given time. For older children or 
adults, the dictionary method is often used. 
Terman (46) used this method in determin- 
ing the words for his test. He made use of 
the Laird and Lee Vest Pocket Dictionary 
containing about 18,000 words. To make his 
list of 100 common words, he took the last 
word in every sixth column. This was sup- 
posed to give a representative list of common 
words. Following this lead, other investi- 
gators used this method with slight variations, 
as will be noted later. Difficult or unfamiliar 

1 Reported but not cited by Sister Irmina (43). 


2? Reported but not cited by Sister Irmina (43). 
3 Ibid. 





go JOURNAL OF EXPERIMENTAL EDUCATION 


words are sought and reported by the pupils 
themselves. Lists, taken from texts, composi- 
tions, or general reading are checked for 
common occurrence on the Thorndike 20,000 
word list, the Powers’ 10,000 word list of 
scientific terms, the Jones’ spelling list of 
4532 words, or other lists. 

To test whether or not a pupil knows the 
words involved, five ways have been rather 
commonly used. First, present the word to 
the pupil and ask for a definition or synonym. 
Second, a method used by Kirkpatrick (29) 
and others gives the subject a list of words 
determined by the dictionary method (Kirk- 
patrick used Webster's Academic) with in- 
structions to mark all words as known, un- 
known, or uncertain. Third, a word is built 
into a multiple choice test with four or five 
definitions in the form of synonyms or sen- 
tences following, and the subject underlines 
the one correct choice. Fourth, the subject 
may use the word in a sentence. Sister 
Irmina (43) holds that this gives the greatest 
assurance that the subject actually knows the 
word. Fifth, the subject simply tells some- 
thing about the word; for example, that it is 
a word used in botany, and so forth. 

The studies which have been made involve 
the size of vocabulary of various groups, the 
amount of yearly or semester gains, and vari- 
ous procedures for learning and testing, as 
well as the relationship of vocabulary to men- 
tal ability, to mastery of school subjects, the 
vocabulary burden of textbooks, and numer- 
ous other related problems. Kirkpatrick 
found that college sophomores have a vocab- 
ulary of 20,120 words, and a positive corre- 
lation between vocabulary and school grades. 
Babbitt (3), by much the same method, 
found that college sophomores have a vocab- 
ulary of between 50,000 and 60,000 words 
and a high correlation between vocabulary 
and scholarship. Brandenburg (7) found an 
annual increase among college sophomores of 
1400 words in vocabulary. 

Witty and Fry (53) made an analysis of 
written compositions of college students at 
the University of Kansas. They discovered 
that the correlation between recognition of 
words, as measured by the Inglis Test, and 
usage of words beyond Thorndike’s 10,000 
was about —.18. In a composition written in 
25 minutes, 85 per cent of the words used by 
freshmen are found in Thorndike’s first 1000 
and 84 per cent of those used by seniors are 
in the first tooo also. This would indicate 
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little growth in usage during the college course 
and shows a conspicuous lack of agreement 
between ability to recognize words and the 
tendency to use the words in written work 
when the student is working under a time 
limit. The median number of words written 
by freshmen in 25 minutes was 231.07, while 
for seniors it was 277.77. This shows greater 
fluency among the latter and suggests greater 
familiarity with the words used. 


Among the studies of methods for enlarg- 
ing vocabularies, that of Harlan (22) is 
good representative. He attempted to measure 
the growth of vocabulary in first semester 
psychology students. He first tried to ceter- 
mine the essential words in the subject by 
having each student make a glossary of tec h- 
nical terms, especially those which caused him 
difficulty. In this way he secured a list oj 
over 400 terms, 176 of which he regarded as 
being of sufficient frequency to be counted as 
essential to a mastery of psychology. Relative 
difficulty of these words was found by pre- 
liminary tests and the 150 most difficult ones 
were built into three multiple choice tests o/ 
50 items each and of equal difficulty. These 
tests were given to over 500 students in five 
normal schools, and norms, validity, and reli- 
ability were determined. These tests were 
given before taking the eighteen-week course 
in beginning psychology and again after tak- 
ing it. With one group reported, the median 
score on tests I and II (100 items) was 5; 
before taking the course and 85 after taking 
the course, thus giving a median gain of 3: 
words. The coefficient between gain and ini- 
tial score on the tests was —.69. This means 
that those who had the smallest vocabulary in 
the beginning made the greatest gains, a find- 
ing in agreement with that of Kellogg (28). 
Harlan gave the tests at the end of 18 weeks 
and again at the end of 24 weeks. The median 
gain for the last six weeks was only 5.6 points 
or one-sixth of the 32 points gain for the first 
18 weeks. The correlation of test scores with 
semester grades was .61 with the exception o! 
one normal school where it was .71. A check 
on ages showed the younger students to have 
a tendency to make the greater gains. The 
correlation between results of the combine? 
test and scores on the Terman Group Test of 
Mental Ability is .71 after taking the course 
and .63 before. Harlan remarks that many 
ordinary words are used in psychology with 3 
technical meaning. 
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In a study by Fisher (17), also in the field 
of psychology, 1200 words were reported by a 
class under three heads: (1) words that could 
be used if necessary, though vaguely; 
(2) words understood if used by someone 
else: and (3) words totally unfamiliar. Of the 
1200 words thus reported, 163 were selected 
as a “word meaning test”, all having been 
handed in by at least 25 students. The study 
resulted in the following conclusions: 


t. Words tend to become less difficult 
through the semester. 

2. Statistical calculations for the relation 
between estimated and actual learning in re- 
gard to consistency, independence, and asso- 
ciation prove that: (a) there was consistency 
between estimated and actual learning on the 
part of students, (b) estimated learning was 
closely related to and dependent upon actual 
learning, and (c) there was marked associa- 
tion between estimated and actual learning. 

3. There was a_ positive correlation 
(.614 + .028) between scores on the Word 
Meaning Test and achievement in the course. 
Vocabulary difficulties did not seem to affect 
achievement to the degree implied by the 
students themselves. 

4. Correlation between scores on the Word 
Meaning Test and Otis Scale scores was posi- 
tive (.507 + .033). 

5. By comparison with Thorndike’s Word 
List and with English’s A Student’s Diction- 
ary of Psychological Terms it was shown: 
(a) six words in the Word Meaning Test 
occurred in the Thorndike Word List, 
(b) fifty words occurred in the dictionary, 
(c) only one occurred in both, and (d) Gates 
expects his students to understand a relatively 
large number of words which they would 
rarely need in general reading. 

6. It is recommended that there be further 
study in the direction of finding out: 
(a) which of these 163 words are the more 
perplexing, (b) in order of increasing diffi- 
culty the relative pertinence of those words to 
the student in educational psychology, and 
(c) a best answer list of the meanings to be 
constructed from the 37,161 responses to the 
Word Meaning Test. This would serve as a 
partial measure of the vocabulary status of 
the students as they begin the course and then 
a second trial at the end would measure 
learning. 

Eurich (15) tested out a method of en- 
larging vocabularies at the University of 


VOCABULARY OF UNIVERSITY STUDENTS gI 


Minnesota. He divided the freshman English 
classes into experimental and control groups. 
Both groups were given a series of English, 
vocabulary, reading and intelligence tests at 
the beginning of the Fall Quarter. Through- 
out the term the experimental group was given 
special vocabulary drills of 100 words each 
week. At the end of the term both groups 
were given a battery of tests similar to the 
initial tests. The process was repeated during 
the Winter Quarter. During the Spring Quar- 
ter no drills were given, but both groups took 
a course in composition or rhetoric. At the 
end of this quarter, all were given a final 
series of tests again. The average gain for the 
whole period was found as well as the average 
gain for each quarter. Out of tooo drill 
words, 150 were taken by chance for the final 
vocabulary test. The members of the control 
group were paired with members of the ex- 
perimental group on the basis of the scores on 
the college ability test. The time for taking 
the test was limited to 20 minutes. The stu- 
dents were divided into three groups on the 
basis of the number of words completed in 
the 20 minutes. The fastest group increased 
most, ranging, for the experimental students, 
from an average on the preliminary test of 
64 to an average on the final Spring Quarter 
test of 81, while the control group ranged 
from 65 to 69. The middle experimental 
group ranged from 48 to 61 and the control 
group from 48 to 55, while the slow experi- 
mental group ranged from 36 to 45 and the 
slow control group from 35 to 42. Besides this 
test on the drill words, both groups were 
tested at the same periods on a general vocab- 
ulary list which in no case overlapped any of 
the 1000 drill words. The experimental group 
here ranged from 46 to 51 words, while the 
control group ranged from 46 to 50. Eurich 
concludes that students enlarge their vocab- 
ularies through special attention directed to 
that end, and that it is better to work with 
specific drills than to attempt to aspire to 
vocabulary growth by general and indirect 
means. 


It is evident from the preceding discussion 
that there is wide divergence in the findings of 
the studies in vocabulary as well as in the 
methods, techniques, and criteria used. Be- 
cause of this lack of agreement, there seemed 
to be ample justification for further experi- 
mentation and investigation in the field, hence 
the following studies by the present writer. 











g2 JOURNAL OF EXPERIMENTAL EDUCATION 


Procedure I 


In the first part of this study, an attempt 
is made to discover whether a procedure can 
be set up in a university class and adminis- 
tered in a way that will secure an amount of 
vocabulary learning beyond that which nor- 
mally ensues when no especial attention is 
given to the matter, and incidentally to 
measure the growth in word mastery under 
usual conditions. 

In the second semester of 1932-33, the 27 
students who were taking Education 162, an 
advanced course in educational psychology, 
were asked to list all the unknown words 
found in reading their text, Burnham’s A 
Wholesome Personality, or other references. 
They were to look up the meanings of these 
words and to hand them with their definitions 
to the instructor. These lists were not re- 
turned to the students, but were scored by the 
writer and tabulations were made of the total 
number of words reported, the number of 
times the same word was reported by the 
same person, the number of times words were 
correctly defined, the number of times words 
were incorrectly defined, and the number of 
words reported but not defined at all. The 
words were also classified as to length—short 
words, one or two syllables, and long words, 
three syllables or more. These findings are 
summarized in Table I. 

TABLE | 


FREQUENCIES OF ITEMS SCORED ON THE WORD LIST 
UNDER PROCEDURE | 








DIFFERENT WORDS REPORTED $02 
1. Ome Of TWO SYLLABLE WORDS 104 
2. THREE OF MORE SYLLABLE WoRDS 398 
WoRO-FPEQUENCY PRODUCT 2628 
a. REPORTED BUT WOT DEFINED! (EXCLUDING 3) 66 
2. INCORRECTLY DEFINED (EXCLUDING 3) 21 
3. REPORTED TWICE OF MORE BY SaME STUDENT 67 
4. COMMECTLY DEFINED (EXCLUDING 4) 2654 
TABLE 11 


DISTRIBUTION OF THE S02 ROPDS CN THE THORNDIKE 
20,000 WOPD LIST 
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From Table I it is seen that this procedure 
gave a working list of 502 words for the 
semester. These words were not necessarily 
obtained chapter by chapter nor handed in at 
stated times, but each student was urged to 
note any unfamiliar word, whether in his first 
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reading or subsequent reading or review. It js 
presumed that each student worked independ- 
ently, though no effort was made to check 
upon this. Certain it is, however, that no stu- 
dent knew what the total list of words con- 
tained. The check on the number of syllables 
was made simply out of curiosity and shows 
about four times as many long words as short 
ones. Of the total number of 2828 frequencies 
of all words reported, the words in 66 word- 
frequencies were not defined at all and the 
words in 21 word-frequencies were incorrectly 
defined. The 21 cases of incorrect definitions 
may have been due to the very common use 
among students of small dictionaries, or t 
poor reading, or to careless interpretation or 
even to inability to understand the definition 
given. This latter explanation may easily be 
true of such words as “eidetic” or “modali- 
ties”. Cases of repetition usually happened 
when a considerable length of time elapsed 
between the two or more reports. Since the 
student did not ordinarily keep a copy of the 
words he reported, he probably forgot just 
what had been included in his former lists. 
The frequencies with which the words were 
reported varied from 24 for one word to one 
for 129 words. 


In order to check the commonality of these 
words they were distributed on the Thorndike 
list of 20,000 words. This is shown in Table 
II where it may be noted that 296 of the list 
of 502 words are found in the list, while 206, 
or 40 per cent marked X in the table, are not. 
Eighty-one are found in the first 10,000, and 
only 6 in the first 5000. According to this 
criterion, the word list for this procedure may 
well be regarded as highly technical, since 
about 40 per cent of the list is not found in 
the Word Book. 


Students were told repeatedly that they 
would be tested from time to time on words 
taken from the lists handed in. This was 
probably the main factor in the motivation 
involved. Four sub-tests‘ of 20 words each, 
chosen at random from the 502 words, were 
given at intervals of about every four weeks, 
followed by a final test of 41 words taken 
from the sub-tests, about 58.5 per cent of this 
final test not being found in Thorndike’ 
Word Book (20,000). This indicates, of 
course, that the list was highly technical. 
However, of the 1107 pupil word responses, 


*A full description of these tests and the manner of giving 
them may be found in the dissertation. 
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8<8 were correct, or an average of 31.78 
words out of the 41, a per cent score of 77.5. 

In order to determine whether or not the 
mastery of 77.5 per cent above indicated was 
4 result of the procedure outlined in this chap- 
ter, the list of the words used in the final test 
was given to the class in Education 262 at the 
end of the second semester, 1934-35. Educa- 
tion 162 had been raised to a 200 course after 
the preliminary work on this procedure was 
done. Whereas the prerequisite for Education 
162 was one 3 hour course in educational psy- 
chology, for Education 262 it was the above 
prerequisite plus 4 hours of education. Cer- 
tainly the educational status of this group was 
not less than the former one. There were 20 
students in this class in Education 262 and 
Burnham’s A Wholesome Personality was 
stil] used as the text. The average score made 
was 21.75 words or an average per cent score 
of 53. 

Furthermore, it was necessary, in order to 
measure the actual increase in vocabulary, to 
know approximately the number of these 
words the students of an incoming class might 
know. Otherwise there would be uncertainty 
as to how many were actually acquired as a 
result of taking this course. So the same list 
of 41 words was given at the beginning of 
the first semester of 1935-36 to a class of 23 
students in Education 263. Education 263 
was used because there was no class at this 
time in Education 262, and the prerequisites 
of the two courses are exactly the same. 
None of this group had taken Education 262. 
The average score made was 11.22, with a 
range of 5 to 20, or an average per cent score 
of 36.5. The results of the test with the three 
groups are shown in Table III. 

TABLE J/1 


COMPARATIVE RESULTS OF THE TEST GIVEN TO THREE 
DIFFERENT GROUPS IN ADVANCED EDUCATIONAL PSYCHOLOGY 





NO.1N USE OF KoROS 
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Fiwat 21.95 
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Eo.262 20 
£0.263 23 


Gaim OF Group A over Group 8 
Gaiw of Group A over Grour C 
Gaim OF Group 8 over Geoue C 


10.03 
20.56 
10.53 





A comparison of the results reveals a 
marked gain on the part of the experimental 
group over both A and B groups. Group A 
knew nearly three times as many words at 
the end of the course using Procedure I as 
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did Group C at the beginning, a gain in aver- 
age per cent score of 41 points. Group B 
knew nearly twice as many words as Group C, 
or they made a gain in average per cent score 
of 16.5 points without the use of Procedure I. 
A comparison of Groups A and B, both hav- 
ing taken the course, though with Group A 
using Procedure I while Group B did not, 
shows an average per cent gain for Group A 
of 24.5 points. 

The results of this experiment seem to in- 
dicate that it is possible to secure a consider- 
able degree of mastery (about 77 per cent) 
of the more or less technical terms of a sub- 
ject, such as educational psychology, by the 
technique described in this procedure. It is 
of interest to note here that the results of 
Eurich’s (15) experiment did not show such 
large gains. In a situation where university 
freshmen at the University of Minnesota were 
drilled on a list of 1000 words and subse- 
quently tested, the maximum gain in average 
per cent score was about 15, whereas here it 
is about 40. 

Moreover it is also shown that, by what 
may be considered an ordinary class proce- 
dure where no particular stress is placed on 
the learning of vocabulary, there may be a 
resultant amount of mastery of about 53 per 
cent of the words more or less peculiar to the 
subject. 

If we admit an initial knowledge of 11 
words of this list of 41, there is a gain of 
about 1o words by an ordinary procedure 
and a gain of about 20 words by the method 
of this experiment, a ratio of 2 to 1 in favor 
of the latter. Moreover in all probability the 
educational status of the class in Education 
162 was lower than that of either of the other 
two groups and hence its relative gain was 
probably greater than the figures would actu- 
ally indicate. On the whole the method seems 
to have sufficient merit to justify further 
investigation and trial. 


ProcepurE II 


An attempt is made here to use a different 
set-up from that of Procedure I, but still to 
measure the increase of word knowledge re- 
sulting from a semester’s work in a university 
class. 

During the year 1932-33, the instructor in 
Education 30, a freshman class in orientation, 
had his students gather lists of unknown 
words found in the textbook used in the 
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course, Werner’s Every College Student’s 
Problems, and hand them in defined, also with 
the chapter indicated in which they were 
found. A total of 1350 words was thus ob- 
tained, ranging from 39 in Chapter I to 113 
in Chapter IV. A few words were reported 
as occurring in two or more chapters, some- 
times in slightly different form, but in the 
main there were few repetitions. These lists 
were taken chapter by chapter and the ro 
words turned in most frequently from each 
chapter were made into lists for subsequent 
use in these experiments. 

In the first semester of 1934—35, these lists 
of ro words for each of the 15 chapters were 
made a part of the assignments in the classes 
in Education 30. This was possible because 
the textbook used was still Werner’s book. 
The lists were in mimeographed form and 
were furnished to the students so that they 
could be studied and the definitions written 
out. The lists of words and definitions were 
then handed to the instructor, whereupon 
they were carefully read, corrected, and often 
commented upon. The lists thus annotated 
were returned to the students and kept by 
them until they had all 15 lists. This gave 
each student a ready reference list where he 
could quickly find the meaning of any of the 
words when needed in case he had forgotten 
them. Furthermore, to make sure he did keep 
the lists, he was required to bring in the 15 
lists at the end of the semester. There were 
no tests on the words during the semester and 
no suggestion to the students that they ever 
would be tested on these words. Hence there 
was no especial motivation aside from their 
own curiosity or eagerness to learn and the 
possible desire for the approval of the in- 
structor. It is also possible that the so-called 
“collecting instinct” impels some people to 
gather words for the mere joy of possession. 
It is also probable that some students realize 
the advantage of acquiring a mastery of the 
vocabulary of a subject in its pursuit. 


At the end of the semester the group was 
given a test on a chance half of each 10 words 
on each chapter. Distribution on the Thorn- 
dike scale shows 17 words in the first 10,000, 
52 in the second 10,000, and 6 not in the list. 
By this criterion the words of the second pro- 
cedure are not so technical as were the 41 
words of the first study, but still they tend 
strongly toward the upper end of the dis- 
tribution. 
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More than 200 of the 245 students in Edy- 
cation 30 during the first semester of 1934-3; 
were girls and, since 104 of these had been 
paired with a like number of girls from the 
Arts and Science College for a subsequent ex. 
periment, it was convenient to use them here. 
They were slightly superior to the whole 
group, as shown in Table IV, but probably 




















TABLE IV 
MEDIANS AND QUARTILES OF THE SCORES OF THREE 
GROUPS OF T. C. FRESHMEN ON THE 0.S.U.P.T 
Pacr |} Tora: Score 
MinEO GRouP (245) MIxEO GeouP (170) GIALS (104 
1ST S€.1934-5 1ST S€¥.10635-6 1st Seu.) 
MO- 64.35 Wo- 66.5 Mo— 97 
Q3-107.10 Q3-106.75 @Gr-112.22 
Ci- 66.35 Qi- 66.33 Qi- 76.67 
Q - 20.36 C - 20.21 C > 19.78 
yeu 242 oe -234 is 183 
Parr ti, VocsauLaey cores 
vineo Group (245) Wix€O GRoup (170) GIRL i04 
157 S€¥.1934-5 A3T S€*.1635-6 1ST LE¥.19434- 
MO-21.6 W0-23..24 M0-25 5 
(3-28.58 3-30.76 G3-31.3 
Qi-15.75 71-16.44 Ci-19. 
Q - 6.42 C - 7.16 G - 6.18 
V- .297 V=- .3086 v= oe 
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not enough so greatly to affect results. 
Omitting here most of the details of adminis- 
tering, checking, and scoring, it is sufficient 
to note that the final tests given to the to4 
girls give the results as shown in Table \. 


TABLE V 
SCOPES CN THE 75 WORD TEST 





Av .CORRECT Per cent 





ToTaAL Correct 
Groupe No. RESPONSES RESPONSES RESPONSES SC ORE 
A 104 7800 3310 31.8 42.5 
6 25 1875 79% 31.67 42.25 
170 «=: 12750 263 5 15.5 20.6 


0.16 


Gaiw of Groupe A over Grove 8 
16.33 


Gaiw of Group A over Group Cc 





As a measure of the representativeness 0! 
these 104 girls, the test was also given to 13 
boys and 12 girls taken at random from the 
remainder of the 245 in the class, and the 
final scores of both groups checked against 
an initial score made by 170 students of Edu- 
cation 30 during the first semester of 1935- 
36. The average word gain of the experimen- 
tal group is 16.33 words or a gain in average 
per cent score of 21.83. 

While the results of this procedure show 
measurable gains in vocabulary, the gains are 
not so pronounced as in the case of Procedure 
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|. There the resulting gain in average per 
cent was about 40, while here it is only about 
>>. Also the educational status of the two 
experimental groups is quite different. Those 
in Education 30 were freshmen, while those 
in Education 162 were juniors and seniors. 
Probably the latter had learned the better 
how to study or had realized more fully the 
value of mastering the vocabulary of their 
school subjects. However, it seems highly 
probable that the difference in achievement 
n the two groups was principally due to the 
motivation of impending tests as used in Pro- 
edure I. Also there is a more complete limi- 
tation of the field in the second set-up. The 
whole task is definitely set before the student. 
He probably felt that in learning the given 
lists he was doing all that was assigned or 
lesired of him. In the first group, however, 
no one knew all the words on the total list or 
which words would be used in the test. This 
very uncertainty may have been a stimulation 
) master all the words of the text and hence 
may have proved to be an advantage rather 
than a weakness. 


PRocEDURE III 


[he aim in this procedure is to discover 
whether there is an appreciable enlargement 
of general vocabulary, such as is represented 
by the words in Part I of the Ohio State Uni- 
versity Psychological Test, among freshmen 
it the University of Nebraska as a result of 
their first semester’s studentship. 


\t the beginning of the school year 1934- 
:5, the freshmen of the Teachers College and 
f the Arts and Science College of the Uni- 
versity of Nebraska were given the Ohio State 
University Psychological Test, Form 17. One 
part of this test is an 80 word, 5 item, mul- 
tiple choice, vocabulary test. Forty-nine of 
these words are found in the first 10,000 
words of the Thorndike list and only two are 
not on the list at all, hence the words are 
much more commonly used than were the 
words of the first two procedures. Scores were 
available for 520 Arts and Science freshmen 
and for 245 Teachers College freshmen. 
Medians and quartiles of the initial scores are 
shown in Table VI. 

: Permission was obtained from Dr. Toops, 
Chairman of the Ohio College Association 
Committee on Intelligence Tests for Entrance, 
to make mimeographed copies of Part I of 
the Ohio State University Psychological Test 





TABLE Vi 
MEDIANS AND QUARTILES OF SCORES OF 520 A.S. 
FRESHMEN AND 245 T.C. FRESHMEN ON PART I, 0.S5.U.P.T. 
A BE cee 
Group Mo @ Qa 
§20 A.S. 105.0 132.4 76.6 26.9 
a5 T.C. 64.4 106.1 66.4 19.9 
"65 BoTn 98.1 126.2 72.9 26 .6 


VOCABULARY SCORE 





Mo Q3 Q: 
§20 A.S. 26.3 36.7 20.5 8. 
245 T.C. 21.6 28.6 15.8 6.4 
765 BoTn 26.2 34.0 18.2 7.9 





in order to repeat it at the end of the 
semester, thus getting a direct measure of the 
average increase in general word knowledge 
on the part of freshmen at the University of 
Nebraska during their first semester in school. 

The best means of administering the final 
test to freshmen in the University of Nebraska 
seemed to be through the English classes, 
since English is a required subject for all 
regular freshmen. This was made _ possible 
through the courtesy of the English Depart- 
ment. Final scores on the vocabulary test 
were obtained for 400 Arts and Science fresh- 
men out of the 520 for whom we had initial 
scores, and for 197 Teachers College freshmen 
out of the original 245. Table VII shows the 


TABLE Vil 
MEDIANS AND MEDIAN GAINS OF 400 A.S. AND 197 
T.C. FRESHMEN ON PART |, 0.S.U.P.T. 





A.S T.C. Come | we 








A LNA [witiay Fiway TwiTiac Fiwar 

MEOTANS 28.69 41.74 22.15 435.56 26.56 39.34 
MEDIAN 

GAINS 13.05 13.4) 12.78 





median gains of these groups. The fact that 
these 597 students were classified into Eng- 
lish 0, English 1, and English 3 gave a chance 
to check on vocabulary growth in these 
groups. English o is a non-credit course taken 
by those who rank low on a placement test 
given before entrance. English 1 is for those 
who show average ability and English 3 is 
for the superior ones. The classification of 
students in the English courses by the depart- 
ment authorities proved to be very much on 
the basis of ability as shown by the Ohio 
State University Psychological Test. The per 
cent gain in vocabulary is shown in Table 
VIII. It is seen that so far as these groups are 
concerned the greatest gains were made by 
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those who had the greatest ability and also 
the greatest initial score in vocabulary. Of 
course this difference in gains of these sections 
may be partly due to the difference in the 
type of work done in the different courses and 
even in the different sections of the same 
course under different instructors. In fact the 
differences may be so great as practically to 
amount to their being entirely different sub- 
jects. It is highly probable that English 3 
leads more directly to a larger use of terms 
than does English o or even English r. 

There were 535 Arts and Science freshmen 
who took the final test for whom there were 
no initial scores. There is no reason to believe 
them very different in ability from the 400 
who had both initial and final scores. In fact 
the application of the formula for the reli- 
ability of the difference of two measures 
shows that the chances are about 99.68 to 
100 that the difference is not significant. The 
median vocabulary score of this group was 
32.2 words against 41.7 of the 400. This dif- 
ference may be due, in part at least, to prac- 
tice effect, though this seems high for an 
interval of 18 weeks. 

The data of this study in growth of vocab- 
ulary among Nebraska University freshmen 
indicate a positive measurable growth during 
their first semester’s work in the university. 
How significant the amount found may be, 
the writer cannot indicate, because there are 
no well established criteria for gauging the 
significance of such growth. It is believed, 
however, that the question of the existence of 
a positive growth has been answered, and 
answered in the affirmative. The purpose of 
this part of the study is served in the dis- 
covery of a measurable growth. 

Incidentally it was also found that the 
groups with the lowest initial scores did not 
always make the largest gain. This is not in 
accord with the findings of several experi- 
menters, especially Harlan (22) and Kel- 
logg (28). This latter phase will be discussed 
more fully later in the present report. 
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ProcepureE IV 

In Procedure III it appeared that the 
Teachers College group, though lower in gep. 
eral ability, made slightly larger gains on thy 
Ohio State University Psychological Test thay 
did the Arts and Science group. Since they 
had been the experimental group under Pro. 
cedure II, there seemed a possibility that the 
reason for their greater gain might be due t 
that procedure. The work in Procedure [\ 
is a further investigation of these comparatiy, 
results. 


In as much as there are doubtless a num. 
ber of factors that enter into the learning 
vocabulary, as in many other learning sity- 
ations, comparisons without a control of som 
or all of these factors cannot be very conclu. 
sive. Two groups paired, individual with in- 
dividual, so as to make the various elements 
of the two groups as nearly alike as possib) 
offer much better means for studying per- 
formance or comparative gains. Consequent)) 
the task was undertaken of pairing individuals 
from the 197 students of the Teachers (ol- 
lege group, used in Procedure III, with stu- 
dents from the 400 Arts and Science freshmen 
used in the same procedure. Since there wer 
very few boys in the Teachers College group 
it was thought best to pair only girls from 
each group. The pairing was done on th 
basis of college year, English course (Eng. 
lish o, English 1, or English 3), sex, total 
score on the Ohio State University Psycho- 
logical Test, and score on the vocabulary sec- 
tion of the same test. On this basis 108 pairs 
were found. It will be recalled that all indi- 
viduals of both these groups had taken the 
Ohio State University Psychological Test at 
the beginning of the first semester, 1934-35 
A vocabulary test of 80 items is a part of this 
test. This was repeated at the end of the 
semester and the gains computed. The results 
are shown in Table IX. 


This table shows that the Teachers College 
group which had been subjected to the study 
technique described in Procedure II made : 
small gain over the other group, as repre- 
sented by 2.32 words or a per cent gain 0! 
10.3. Apparently it does not show much 
transfer from a situation in which there is @ 
study of a prescribed set of technical words 
more or less peculiar to a subject, to a situ- 
ation involving a set of words much less tech- 
nical and much more of a kind to be used in 
general without relation to any special sub 


Fi Welt on ash 


cat 








ee ee a 


ei Bae 
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ject. Statistically it has small significance 
and probably means only that if the same 
experiment were repeated, the chances are 
about 91 to 100 that the same results would 
ensue, or that g1 students out of roo would 
benefit by the same procedure to the extent 
of a gain of 2.32 words in a set of 80 words 
of the type used in the test. 

While the transfer effect to a general vocab- 
ulary is thus found to be small, it must not 
be forgotten that Procedure II, here being 
tested, did result in measurable gains in the 
vocabulary of the subject being studied and 
that the 2.32 words of gain then should be 
added to the results before discovered in fully 
evaluating the method. 

The question has been raised repeatedly as 
to whether there is a positive or negative cor- 
relation between initial scores and vocabulary 
gains. Harlan (22) found a correlation of 
—.61, and the findings of some others have 
been quite similar. In Procedure III it was 
shown that this negative correlation is not 
true of groups of English classes as wholes, 
although the point was noted that the work 
in the different groups was probably so dif- 
ferent as to make a completely changed situ- 
ation. Since, however, we have the students 
equated as to English course, as well as in 
terms of the other factors above described, 
the opportunity presented itself of making 
comparisons with that uncertain element 
eliminated. So the coefficient of correlation 
for the 108 pairs used in this experiment be- 
tween initial score and gain was computed. 
It was found to be .179 + .044. This posi- 
tive finding may be peculiar to the type of 
words used here or possibly to this particular 
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group. Of course, there may be still other 
elements not taken into account. For a list of 
words on which the median initial score was 
high and on which a near approach to a com- 
plete learning obtained, it is plain that those 
starting with low scores would probably make 
the greater gains. But in this situation, the 
final median was only about half of the words 
of the list, no one having a perfect score, and 
this meant that all had plenty of room for 
improvement of their scores. On the other 
hand it appears that students or classes with 
high ability generally have high initial scores 
in vocabulary, as well as high final scores. 
This is reflected in the tendency to use vocab- 
ulary tests in making up intelligence tests. 


A DISCUSSION OF ERRORS 


It is evident from an examination of the 
records of the experimentation so far done 
that there has been no organized or con- 
certed or comprehensive attack upon the 
problem of vocabulary growth among college 
students. Many questions are yet unanswered. 
The set-ups have been so varied and the 
terms of measurements of results so different 
that it is difficult to compare the findings, or 
to draw conclusions of a very general nature. 
The work as outlined thus far in this study 
has, however, added definiteness to some 
aspects of the subject, as shown by the con- 
clusions. It is desired here to note some addi- 
tional considerations growing out of the situ- 
ations incident to this study. 

It is of interest to note how very different 
are the gains made by students of equal abil- 
ity, even in the same English sections. Since 
the instruction is, in the main, at least some- 
what similar in the various sections of the 
same courses, there must be some other 
factors affecting the gains in vocabulary 
growth. It is highly probable that individual 
differences in traits, ability, and experiences 
all combine to bring about some of the ob- 
served dissimilarities. Table X shows the dif- 
ferences in gain in a few cases taken at ran- 
dom. Thus we find that there are marked 
differences in gains where total ability scores 
are practically the same, as with (1) and (2), 
or (9) and (10), or particularly with (14) 
and (15) where the gain in one instance is 
40 words, the highest gain made by anyone, 
and in the other a loss of 2 words. In the 
case of number (11) there is a boy of medium 
ability and high initial score who suffers a 
loss of 19 words, the largest loss encountered 
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by anyone. One can scarcely refrain from 
speculation as to the cause of such a loss, and 
is impelled to the judgment that lucky guess- 
ing must have played a large part in the in- 
itial score. Also in number (13) is found a 
girl making the highest ability score and a 
high initial score who suffers a loss of 4 
words. It is probable that only a detailed 
diagnosis of each individual would reveal the 
many varied yet specific causes of the extreme 
and unusual results. Certain it is that stu- 
dents found in the same classes and having 
all degrees of general ability also have a great 
variety of capacities for learning words. This 
is in keeping with the observations, long 
made by teachers, that students vary in re- 
tentiveness, speed of learning, thoroughness of 
learning, eagerness to learn, impelling curi- 
osity, and steadfastness of effort. Although 
the group as a whole tends toward a common 
mode of behaviour, any given individual's 
outcome would not be predictable to a very 
high degree. 

In order to get a cross-section view of some 
phases of word Knowledge on the part of those 
taking the Ohio State University Psycholog- 
ical Test, words 4, 8, 25, 41, 57, and 69 were 
traced through roo papers, taken at random. 
A tabulation was made of the items, as 
shown in Table XT. 


TABLE Xx! 
RESPONSES MADE BY 100 STUDENTS TO CERTAIN 
WORDS ON THE 0.S.U.P.T. 





Iwcor= No. NO. #hO PER 





Tworw— CORRECT RECT wre O10 wor CENT 

Ome RE- et- PASS REACH KNOW 
Woros INDEX «SPONSES SPONSES IT BY IT inc tT 
4 FRISKY u“ 96 3 i ° 96 
@ LLEGIM@LE Ty 68 i é © 68 
25 GREGARIOUS 19 29 as 23 © 29 
@1 auTeuisTic 17 a 3 z ” 23 
67 amisToceaTic 7 a 38 y 2 36 
69 Sam & 13 36 2 “a9 2 
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It is interesting to note that “frisky” and 
“illegible”, both in Thorndike’s 14th 1ooo, 
were known by nearly all, 96 and 88 per cent 
respectively; “aristocratic”, a word in Thorn- 
dike’s 7th 1000, was known by only 36 per 
cent; and that “bawl’, in Thorndike’s 8th 
1000, was known by only 26 per cent. The 
small number of correct responses for the 
latter word is probably due to the fact that 
the popular notion of the word “baw!” in- 
volves the idea of crying (shedding of tears). 
The test gave both “crying” and “yell” from 
which to choose and 36 out of the 51 reaching 
this word marked “crying”, whereas the key 
gives “yell” as the correct response. While 
“gregarious” is the highest on the Thorndike 
Word List, it is nevertheless known by a 
larger per cent of the students than is “altru- 
istic’ or “bawl”. “Altruistic” was passed over 
by more students than was any other word, 
and is the least known of the group. This 
would seem to indicate that the word is not 
very much in use among first semester uni- 
versity freshmen, and probably not very much 
used by high school teachers or university 
freshmen instructors. It may be that “altru- 
ism” does not have the place in our philos- 
ophies of living that it formerly had. 

Some curious and interesting side-lights on 
students’ ways of thinking are furnished by 
an examination of some of the errors made in 
the definitions in the tests. In the main these 
errors are segregated, individual affairs. Occa- 
sionally, however, there seems to be a sort of 
mass error. An illustration of this is found in 
the word “exotic”. More than a dozen stu- 
dents used this as a synonym for “glamorous”. 
Upon questioning some of them, it appeared 
that certain movie magazines rather widely 
read, especially at sorority houses, had pub- 
lished a number of articles on foreign movie 
stars in Hollywood. They were referred to 
repeatedly as “exotic”, a proper use of the 
word. Many students, however, interpreted it 
to mean attractive, and so associated it with 
“glamorous”. Clothed with this meaning it 
was rather widely circulated as a desirable 
and impressive addition to the vocabulary of 
certain students. 

In order to see whether errors were made 
entirely on the occasion of the test as a failure 
to recall the meanings of words once known, 
or whether the error was made in the original 
definition as handed in with subsequent fail- 
ure to appropriate the corrections of the 
reader, the errors, in the case of 150 words 
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taken from the various procedures, were 
traced back to the original definitions. In 
only 12 of these cases were there errors in the 
original and in only 3 of these, 2 per cent, did 
the original error persist through to the final. 


Some of the more striking errors are listed 
below. The reasons for some of the mistakes 
are patent enough. For others they are 
obscure. Of course some are merely wild 
suesses, a grabbing at a straw. Explanations 
are offered in some cases by the writer of this 
thesis. and are set off by parentheses. These 
may not always be the best or correct explan- 
tions. since they are purely subjective. 


Abeyance 
testrainance (invented for “restraining”) 
Obedience (taking “abey” for “obey”) 
Space (possibly thinking of “abysmal”) _ 
Expectation (probably get the connection 
through the idea of awaiting, or holding 
back) 
Aesthe tic 
Strong 
sound) 
Unearthly (possibly for “ethereal’’) 
Hermit-type (“ascetic’’) 
Drug to make unconscious (for “anaes- 
thetic”, a common error) 


(“athletic” from appearance and 


Ambiguity 

Beauty (possibly from the sound of the last 
part of word) 

Organ of the heart 

Cheating 

Ability to 
dexterity”) 

Colored (“amber”) 

Overbearing 


use either hand (“ambi- 


Animation 
Liking; loving (from “amative’’) 
Atrophy 

Reward; prize-medal (separating the “a 
from “trophy”) 

Dark (possibly for “atropine” from night- 
shade; or dark-colored from spoiling or 
rotting; or more likely a pure guess) 

Guidance 


” 


Bizarre 
Easy-going 


Sale; social (for “bazaar’”’) 
Add 


Bare (a nudist ‘might have been called 

“bizarre”) 
Cleavage 

ioe as ane be sagt from ” eile 
ines of age ssibly “lineage” 

Unfinished oh (“selv e 

Distance (Note: This word is often taken as 
coming from the intransitive verb 
“cleave” meaning to “cling”, whereas it 
comes from the transitive verb “cleave” 
meaning to “split” or “rive”) 
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Coerce 
Speak; discuss (possibly from “converse”’) 
A plan (from “course’’) 
Restrain; constrain (negative aspect of 
“coerce’’) 


Decadence 

Ten years (“decade”) 
Elixir 

Happiness (“elation”) 

To bow to 

Table of contents (“index’’) 


Elucidation 
To get away from as football player from 
tacklers (“elude”) 
Elimination (purely from the sound) 


Exigencies 
Science of 
(“eugenics”) 
Felicity 
Catlikeness; cunning (“feline’’) 
Trustworthiness (“fidelity”) 
Crime (“felonious’’) 
Wisdom 
Fetichism 
To obtain; far-fetched ideas; act of bring- 
ing (“‘fetch’’) 
Kind of government (‘“Facism’’?) 


Folderol 
Booklet (‘“folder’’) 
To lay into folds (“fold’’) 
Term used in music 


Hypothesis 
Stimulate 
Supposity (a pretty good invention) 
Reaction (from “reaction hypothesis” ) 


Innate 
Without life; dead (“inanimate’’) 
Silly (“inane’’) 
Nonentity 
People over 90 years of age (from the base 
“nones’’) 
Usefulness 


Obstreperous 
Expelling, sick, relating to doctors or med- 
icine, a noise in the stomach (“obstetrics’’) 


Pertinent 
Continuing for a time (“pertinacity”’) 
Rude, stuck-up, saucy (“impertinent”) 
Being alert or pert (from the first part of 
the word) 


Phenomena 
A species of animal 
A sickness of the lungs (“pneumonia”) 
Some kind of a drug 


Philanthropic 
Belon ing to the monkey family (“anthro- 


bringing up offspring 


Study of plants 
Presaged 
Wise share Me cap (from “pre” and “sage’’) 
Forced (“p ”) 
The presaged students often don’t know 
their vocabulary 
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Prognostic 

She had a prognostic personality 

The prognostic boy went whistling down the 
street 

Pylorus 

The pylorus condition of many is pitiful 

She made grades by a pylorus amount of 
study (“prodigious”) 

Recapitulation 

Recovery (from “recapture’’) 

Refinancing (from “recapitalizing’’) 

New headdress; putting the head on again 
after it has been taken off (from “re” and 
“caput”) 

Recapturing 

Renewed life (“recuperation’’) 

Sedentary 

Sediments 

Alone (“solitary”) 

Second (“secondary”) 

Farming is sedentary work (farmers may 
ride much these days) 

Stereographs 

Study of stars (possibly from “graph” of 
stars) 

A typewritten page; instruments which 
make copies (“stereotype”) 

An instrument with which to listen to the 
lungs (“stethoscope”) 

Vestiges 

Servants (“vassals”?) 

Clothes (“vestments”) 

Trips or journeys (possibly “visitation”’) 

To be alone 

Savages 

Breathing places (“vesicles’’) 

He vestiges everyone that he meets (“in- 


vestigates”’) 
The vestiges of trusts are not always safe 


(“investments”’) 
Viands 
Glands of body; “duckless” glands (possibly 
from “endrocine”’) 


Deficient people; fools 

Crossings (probably from the Latin “via’’) 
Sellers (“venders”) 

Bunches; groups 

Places of abundance 

Vials and vandals 


A survey of the errors found in the defini- 
tions offered in the tests leads to the observa- 
tion that many, probably most, of them can 
be classified in a rather small number of cate- 
gories. The reasons for these errors may be 
due to a multiplicity of causes, but certainly 
among the chief ones are the following: 


. Faulty visual imagery 

. Faulty auditory imagery 

Poor or careless reading 

Defective analysis and faulty reasoning 
Failure to appropriate the evident 


. Evasion 


I 
2 
3. 
4. 
5. 
6 
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Below is an attempted classification of 
many of the errors noted, with the probable 
reasons enclosed in parentheses, by number 
referring to the above list: 


Similarity of appearances or sound or both 
(1, 2) 

“Bazaar” for “bizarre”; “anaesthetic” for 
“aesthetic”; “portent” for “potent”: “de. 
preciate” for “deprecate”; “corroborate” 
for “collaborate”; “folder” for “foldero}” 


Disregard of parts of speech, tense, number, 
ete. (3, 5) 

“Philanthropic” defined as “study of plants”: 
“criterion” as “to make easy”; “frustrate” 
for “frustration”; “stimulate” for “hy. 
pothesis”; “a noise in the stomach” for 
“obstreperous”; “decrease” for “decreas. 
ing”; “frisking” for “frisky”’. 

It seems odd that students would fail to take 
cognizance of the hints that often lie jp 
the forms of speech. For instance, in the 
Ohio State University Vocabulary Test, 
often two and sometimes three of the five 
words offered as being “the same as” or 
“the opposite of” may be eliminated by 
this process alone. For example, “bewitch” 
the same as interest, charm, bewitching, 
entice, enslaved, where bewitching and 
enslaved, on account of form are clearly 
out of the picture; or in “Herculean” the 
same as impregnable, task, difficult, 
hardly, Hercules, where task, hardly and 
Hercules can be eliminated on account of 


form. 
Substitution of words commonly associated 
(1, 2, 5) 
“Spontaneous” for “combustion”; “hy- 


pothesis” for “reaction”; “delinquent” for 
“bills”, “taxes”, etce.; “glamorous” for 
“shiny”. 

Substitution of agent for quality or agent for 
the product (4) 

“Judge” for “criterion”; “stereograph” for 

“stereoscope”; “machine” for “lexicogra- 
pher”. 


Confusion of stem (1, 2, 4,5) 
“Decadence” for “decade”; “untenable” for 
“untenantable”; “perusal” for “pursual”, 


Failure to recognize the significance of pre 
fixes or the lack of prefixes (5) 

“Atrophy” for “reward” or “trophy”; 
“capitulation” for “recapitulation”; “ex 
temporaneous” for “contemporaneous ; 
“impertinent” for “pertinent”. 

Using some form of the word in the definition, 
especially with negatives (6) 

As “intangible equals that which is pot 
tangible”; “unambiguous” defined as “not 
ambiguous”; “analytical” as “that which 
can > analyzed”; “indolent” as “no 
dolent”. 
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Inventions (6) 

Restrainance, acculation, supposity 

Note: It seems evident that students some- 
times invent terms that probably have in 
their minds some hazy connection with the 
word they are trying to define or some 
element of sound or appearance in com- 
mon with it. 

Guessing (6) 

“Potent” defined as “on time”’. 

“Elixir” defined as “to bow to”. 

Note: There is no way by which one can 
always be sure that either a correct or an 
incorrect answer has not been a guess. 
When no possible connection between the 
word and the definition can be discovered, 
it is rather convincing that the effort was 
a guess. 


SUMMARY 


The series of studies recorded above seem 
to point toward the following conclusions: 


1. Advanced college students acquire about 
half of the vocabulary of their subjects 
under ordinary class procedure, and may 
acquire three-fourths under a program 
where impending tests are used to moti- 
vate the learning of vocabulary. 


we 


. Freshmen, under a program emphasizing 
vocabulary study but not motivated by 
a testing program, will know about 40 
per cent of the technical words of a new 
subject. 


3. Freshmen in Arts and Science and in 
Teachers College, in the ordinary course 
of their college work during a semester, 
do grow to the extent of about 13 words 
in a list of 80 of the general type of 
words represented by the Ohio State 
University Psychological Test, Part I. 
If Terman’s estimate of 11,700 words 
for the average adult is correct, these 
13 words out of 80 might mean a total 
growth of well up toward 2000 words. 


4. The higher the native ability of college 
students, as measured by the Ohio State 
University Psychological Test, the 
higher the vocabulary ability and the 
_ extensive the gains during student- 
ship. 


on" 


. The effect of a growth in vocabulary, 
due to the stimulation of a special 
method in a special subject, does not 
carry over very largely into the acquisi- 
tion of a general vocabulary; that is, 
the transfer is not large. 
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Io! 


6. Errors may be classified into a number 
of categories, the apparent reasons for 
which are faulty visual imagery, faulty 
auditory imagery, poor or careless read- 
ing, defective analysis and faulty rea- 
soning, failure to appropriate the evi- 
dent, evasion, or any combinations of 
these. 


An attempt to prevent pupils from falling 


into such errors as are here portrayed seems 


to 


be well worth while on the part of elemen- 


tary and secondary teachers. 


do 


In general, it appears that college students 
learn measurable amounts of the words 


peculiar to their subjects and also that they 
grow in general vocabulary, but that this 


learning 


is appreciably increased under a 


program of directed effort to obtain such 
learning. 


3 
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READING COMPREHENSION AMONG COLLEGE STUDENTS* 
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Eastern Illinois State Teachers College 


The Problem.—The problem is to deter- 
mine the improvement of the silent reading 
comprehension of a group of undergraduate 
students who have received special training 
during an eleven-week period and who have 
had conferences with regard to _ personal 
adjustment to the college situation. 

rhe study was delimited in five ways: 

First, attention was concentrated on the 
mprovement of reading comprehension, 
eliminating direct emphasis on rate of read- 
ing. Restriction of emphasis to either compre- 
hension or rate of reading is in accordance 
with current practice among colleges and uni- 
versities where remedial reading is a perma- 
nent part of the academic program. The ad- 
visability of concentrating effort in an inten- 
sive program upon one or the other of these 
two factors is indicated by certain recent 
studies which show a lack of high correspond- 
ence between rate and comprehension. 

Second, the individuals selected for the 
tudy were chosen from a group of approxi- 
mately 300 students enrolled in foundation 
courses in educational psychology (Educa- 
tional Psychology 035.19 and Educational 
Psychology 035.13) in the School of Educa- 
tion, New York University. The group in- 
cluded freshmen, sophomores, juniors, and 
seniors ranging in age from 16-2 years to 
38-2 years (average age 19-5). 

Third, the intensive study herein described 
was confined to 84 undergraduate students in 
order to secure case study data in detail and 
to center remedial work around individual 
disabilities and needs. From this group of 84 
students, 50 were selected to constitute the 
experimental group for the purpose of anal- 
ysis of the data obtained. 

Fourth, remedial instruction was given to 
the experimental group in conjunction with 
the class work in educational psychology. No 
extra group instruction periods were sched- 


__* Abstract of a thesis submitted in partial fulfillment of the 
Ce teements for the degree of Doctor of Philosophy in the 
choo! of Education of New York University, 1939. An ex- 
periment to determine the improvement in reading compre- 
ension of 84 undergraduate students enrolled in the Depart- 
ment of Educational Psychology, New York University. 
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uled, but individual conferences occurred 
during the students’ free periods. In this way, 
aid in reading comprehension was an integral 
part of the course rather than an extra burden 
added to students’ academic schedules. 

Fifth, the training period covered 11 weeks 
exclusive of time devoted to testing and re- 
testing. It was advisable to condense this 
work in reading into one semester because of 
the diversity of individual course requirements 
and the resulting general redistribution at the 
beginning of a new semester. 

Previous Investigations—The review of 
previous investigations relative to the prob- 
lem was divided into four chronological 
periods: 1800-1878; 1879-1910; IgII—1924; 
and 1925-1938. The investigations in each 
period were grouped under eight topics: eye 
movements, perception, oral and silent read- 
ing, rate and comprehension, inner speech, 
reading tests, disability diagnosis, and reme- 
dial programs. 

The over-all view of previous investigations 
showed that there were three trends or devel- 
opments during the past 138 years. First, at- 
tention was directed to the analysis of the 
physiological and psychological factors in- 
volved in reading. It then followed that the 
results of these analyses were applied to ele- 
mentary, high school, and college levels. The 
most recent development was the critical eval- 
uation of testing devices, diagnostic methods, 
and remedial reading techniques. 

Certain names are recurrent in the litera- 
ture in this field. In the study of eye move- 
ments, Huey, Dodge, Judd, Dearborn, C. T. 
Gray, Tinker, and Buswell made noteworthy 
contributions. Important studies in percep- 
tion have been made by Cattell, Dearborn, 
Erdmann and Dodge, Orton, and Marion 
Monroe. Differences between oral and silent 
reading have been demonstrated by Judd, 
Pintner and Gilliland, and Buswell. Investi- 
gations in the relationship between rate and 
comprehension in silent reading have been 
made by Judd, Buswell, and C. T. Gray. 
Reading and vocabulary tests have been de- 
veloped by Whipple, W. S. Monroe, Thorn- 
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dike, Inglis, S. L. and L. C. Pressey, Gates, 
Eurich, Nelson and Denny, and Green, Jor- 
genson, and V. H. Kelley. Notable diagnostic 
techniques have been developed by Whipple, 
W. S. Gray, Betts, Gates, E. A. Taylor, and 
Marion Monroe. 


Current Practice——The discussion of cur- 
rent practice in the field was divided into 
three parts. First, a summary was given of 
Parr’s study in 1929 of the extent of remedial 
reading in state universities. 

Second, Strang’s study in 1937 of 155 col- 
leges was summarized. 


Third, information was presented which 
had been collected from a questionnaire sent 
out by the investigator in 1937, prior to the 
publication of Strang’s report. Questionnaires 
were sent to the accredited colleges and uni- 
versities in the United States. Replies were 
received from 75.7 per cent (318) of the in- 
stitutions to which inquiries were mailed. Of 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No 


the institutions from which replies were re- 
ceived, 35.2 per cent use a diagnostic reading 
test for freshmen. The most frequently re- 
ported test was the Iowa Silent Reading Test, 
Advanced Form. In 31.8 per cent of the in- 
stitutions, reading tests are used with students 
whose academic marks are low. The follow-up 
program with freshmen is carried on in a 
remedial reading class in 41.7 per cent of the 
institutions. Only 7.5 per cent of the insti- 
tutions have how-to-study classes. Data con- 
cerning follow-up programs and how-to-study 
training are presented in Tables I and II. 

The diversity of opinion and _ practice 
among colleges with respect to reading tests, 
remedial programs, and how-to-study training 
was revealed by comments among the ques- 
tionnaire replies. 

Among the detailed reports received those 
from Goucher, Mount Holyoke, Dartmouth, 
and the University of Louisville were partic- 
ularly informative. 
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R TABLE I1 
FRESHMAN TRAINING IN HOW-TO-—STUDY 
ALL 
COLLEGES AND UNIVERSITIES TEACHERS COLLEGES INSTITUTIONS 
METHOD NUMBER PER CENT NUMBER PER CENT NUMBER PER 
(102) (71) (175) CENT 
eeTATION COURSE 17 16.1 11 15.5 28 16.2 
NGLISH COURSE 14 13." 13 18.3 27 15 .6 
NOITVIOUAL CONFERENCES 20 19.6 6 8.5 26 15.0 
AN 8 2 10 
ELLANEOUS PERSONNEL OFFICERS 10 Z 12 
RS 2 2 4 
MEDIAL READING CLASS 9 8 2.8 11 6.4 
‘ -stTudDY CLASS 11 10.8 2 2.8 13 7.5 
wiTk ACADEWIC CREDIT 2 - 2 
wiTOUT ACADEMIC CREDIT 9 2 11 
NFERENCES 12 11.8 5 7.0 1” 9.8 
with THE DEAN z 1 > 
witk TME LIBRARIAN 1 2 z 
PPER CLASSMEN 6 2 8 
FIED 2 a 2 
aANEOUS COURSES 24 23.5 28 39.4 52 30.1 
nT? UCTORY EDUCATION - 8 8 
PSYCHOLOGY - 1 1 
cTORY PSYCHOLOGY z 3 6 
LOGY OF READING 1 - 1 
1 a 1 
NTEMPORARY CIVILIZATION 1 - 1 
NTRODUCTION TO TEACHING 1 - 1 
NSPE CIF IED 27 16 33 
LABORATORY - - 2 2.8 2 1.2 
CLINIC 2 2. 1 1.4 3 1.7 
N WEEK LECTURES 6 5.9 1 1.4 ” 4.0 
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R MORE METHODS. 
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Procedure-—A group of 279 undergradu- 
ites enrolled in the basic psychology courses 
in the School of Education was tested. Tests 
in three areas were used: 
1. The Henmon—Nelson Test of Mental 
Ability. 

2. Whipple’s High School 
Reading Test. 

3. The Inglis Test of English Vocabulary. 


and College 


ae 


_In addition, the Sims Score Card for Socio- 
Economic Status was administered. 


lhe Control and Experimental Groups were 
selected from the 279 students to whom the 
initial tests were administered. Each group 
consisted of fifty persons. Each member of 
the Experimental Group was paired with a 
member of the Control Group on the basis of 
six criteria: (1) nationality, (2) sex, (3) ac- 
ademic year, (4) chronological age, (5) score 





INSTITUTIONS 


SUPPLYING THE ABOVE INFORMATION. 


on the Sims test, and (6) an average of the 
individual sigma scores on the mental ability, 
reading, and vocabulary tests. The pairs were 
identical with regard to nationality, sex, and 
academic year. The largest difference in age 
between two subjects was 8.0 months, and the 
average difference was 2.9 months. There was 
an average difference between subjects on the 
Sims test of four points. No student fell below 
the “medium high” designation on this test. 
The largest difference between any two mem- 
bers of a pair on average sigma score was .5 
of a point. The average difference was .166 
of a point. 

At the close of the training period, the 
Control and Experimental Groups were re- 
tested with alternate forms of the Henmon— 
Nelson, Whipple, and Inglis tests. The Con- 
trol Group underwent no remedial training. 

The remedial program for the Experimental 
Group included the administration of four 
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tests: (1) Betts’ Visual Sensation and Per- 
ception test administered with the Keystone 
Telebinocular, (2) the Ophthalmograph test, 
(3) Gray’s Oral Reading Paragraphs Test, 
and (4) the Iowa Silent Reading Test, Ad- 
vanced Form. One form of the Iowa test was 
administered at the beginning of the training 
period, and the other form at the close of the 
period. Each member of the Experimental 
Group knew the results of each of these tests. 


The investigator conferred individually 
with each member of the Group. To facilitate 
these conferences the investigator prepared 
and administered two questionnaires. The 
first dealt with personal data concerning 
nationality, schools attended, development of 
reading habits, handedness, present reading 
habits, likes and dislikes of high school and 
college subjects, and personality adjustment. 
The second dealt with time distribution and 
included: hours spent in classes, in prepara- 
tion for classes, in sleeping, eating, recreation, 
gainful employment, and commuting to and 
from the University. 

Six phases of the reading problem were 
covered during the training period. Each 
phase was presented orally and in outline form 
to the group. The six outlines were entitled: 
(1) aids in reading comprehension, (2) pur- 
poses of reading, (3) how to increase your 
vocabulary, (4) vocabulary and _ success, 
(5) mechanics of reading, and (6) diagnosis 
of reading disability. 


The Experimental Group received a list of 
suggested books and periodicals for supple- 
mentary reading in connection with the psy- 
chology course requirements. During the 
training period two written reports on supple- 
mentary readings were handed in. 


Three exercises from Strang’s Study Type 
of Reading Exercises were administered dur- 
ing the training period. Each student received 
a report on his comprehension and rate scores 
on these tests. 


An illustrated lecture on the use of the 
library was given to the group by the Chief 
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of the Reader’s Department at the Washing. 
ton Square Library, New York University. 

Results —The results of the training ey. 
periment were summarized by comparing the 
initial and re-test mean scores of the Experi. 
mental Group with those of the Contro] 
Group. These data are presented in Table II] 

On the Henmon—Nelson Test of Menta! 
Ability the Experimental Group made a sig. 
nificant gain. The average gain was 1.7 
points. The Control Group made an averay 
gain of .16 points which was not statistical}; 
significant. 

On Whipple’s High School and Colles: 
Reading Test the Experimental Group mac 
an average gain of .72 points. The chances 
are 96 in roo that this is a significant gain 
The Control Group made an average gain of 
.48 points which is not significant. 

On the Inglis Test of English Vocabulary 
the Experimental Group made a significant 
gain. The average gain was 11.78 points. The 
Control Group also made a significant gain 
The average gain for the Control Group was 
13.92 points. 

Conclusions.—Four conclusions were {orm- 
ulated: 

(1) Silent reading comprehension can be 
improved by training during one semester in 
connection with a course in educational! psy- 
chology. This fact has been demonstrated by 
a comparison of initial and re-test scores on 
the Iowa Silent Reading Test and by a con- 
parison of results from initial and re-tests ad- 
ministered to an Experimental Group with the 
results from testing a Control Group. The 
training program required little more than 
one hour of each student’s time in excess 0! 
class periods and routine class assignments. 

(2) The Betts Visual Sensation and Per- 
ception Test is a definite aid in a remedial 
reading program. In this investigation there 
is not a high correlation between Betts test re- 
sults and reading difficulty as indicated on 
other tests. However, the Betts Test shows up 
the need for further eye examinations. In two 


TABLE III 
TOTAL AND AVERAGE GAINS MADE BY EXPERIMENTAL AND CONTROL Groups ON ALL TESTS 
Henmon-Nelson Whipple Inglis 
Total Average Total Average Total Average 
Group Gain Gain Gain Gain Gain = Gain 
0 ee 85 1.70 36 72 589 13.92 
RE eee ete 8 16 24 .48 696 17.91 
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cases in this investigation the test revealed 
previously unrecognized but serious trouble. 

(3) The Ophthalmograph tests supply the 
student and the instructor with valuable ob- 
iective information which is not furnished by 
pencil-and-paper reading tests. Evidence of 
regressions, langthy fixations, and unevenness 
of reading helps the student to understand his 
reading problem. The graph gives the in- 
structor assistance in developing an individu- 
alized remedial program. 

(4) Emphasis upon the improvement of 
reading comprehension does not impede im- 
provement in the rate of reading. This fact is 
demonstrated by the gains shown on the rate 
test in the Towa Silent Reading Test. 

}y-Products—From the questionnaires ad- 
ministered to the Experimental Group, five 
items of information were gathered which 
were presented as by-products of the study: 


(1) Little voluntary reading of books is 
done by the students. The average number of 
books reported to have been read during the 
preceding eight months was 1.3, and 97 per 
cent of the books mentioned were fiction. The 
period covered by the “preceding eight 
months” was from June through January. 

(2) Comparison of the amount of reading 
done (books and magazines) and the indi- 
vidual average sigma score (an average of 
reading, vocabulary, and mental ability test 
scores) showed little relationship between the 
two. Also there was found to be little relation 
between the number of books owned and the 
average sigma score. 

(3) An average of 1.2 hours is spent by 
— student in preparation for each credit 
our. 

(4) There is a wide range (from o to ro 
hours) in the number of hours spent in class 
per day. 

(5) On every day except Saturday, there 
were some individuals who had no lunch 
period. A student was considered to have no 
time scheduled for lunch when his classes ran 
through from 10:00 A.M. until 2:00 P.M. or 
wn or from 11:00 A.M. until 3:00 P.M. or 
ater. 

Applications—Suggested applications of 
> investigation were presented under six 
opics: 

(1) The continued inclusion of the study 
of reading techniques and of the improvement 
of individual reading habits would enrich 
the basic courses in the Department of Edu- 
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cational Psychology for two _ reasons: 
(a) Efficiency in reading involves the forma- 
tion of important habits of thought and of 
eye movement. The understanding of the 
processes of habit formation and learning is 
an important part of these basic courses. 
(b) Students in the process of analyzing and 
improving their own reading habits have a 
situation within their immediate experience 
upon which to base the study of the learning 
process. 

(2) The use of available instruments for 
the detection of possible lacks in visual acu- 
ity and reading efficiency is beneficial to col- 
lege students. The assumption of responsibil- 
ity by faculty and administration for investi- 
gation into these student problems would 
facilitate individual adjustment to the college 
situation. 


(3) The small amount of voluntary (not 
required) reading done by the students in this 
investigation is significant. Motivation for 
leisure time reading of worthwhile books 
should come from the men and women under 
whom these students take their undergraduate 
work. The responsibility for developing an 
enthusiasm for reading rests with the student; 
the incentive must come from the instructor. 


(4) The investigation revealed that an 
average of 1.2 hours is spent by students in 
preparation for each credit hour. (It is un- 
likely that the students underestimated the 
time which they spent in preparation for 
classes.) For a student carrying 15 credit 
hours, this means 33 hours per week devoted 
to academic pursuits. A survey of the time 
spent in preparation for and attendance at 
classes by a large sampling of the undergrad- 
uate students in the University would furnish 
data for recommendations with regard to in- 
creases and decreases in work required under 
the various curricula. 

(5) The distribution of daily class hours 
is very irregular. It is doubtful that any 
undergraduate should be permitted to spend 
ten hours in one day in classes. These findings 
could be used as a basis for consideration of 
revision of the procedure of approving sched- 
ules at the time of registration. 

(6) Some students have no lunch period. 
Inquiry by the investigator revealed that 
many of these students manage to eat some- 
thing in the middle of the day. Time for a 
sandwich and a cup of coffee is secured by 
leaving one class early and going to the next 














108 JOURNAL OF EXPERIMENTAL EDUCATION 


class several minutes late. It is suggested 
that the situation be investigated further in 
view of the responsibility of the institution 
for the welfare of the students. 

Included in the complete study are a bib- 
liography, and an appendix containing 
samples of tests, questionnaires, and record 
forms. A selected bibliography follows the 
present discussion. 
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SEX DIFFERENCES IN SPEED OF READING* 


JosEPH 


E. Moore 


George Peabody College for Teachers 


The question concerning possible sex dif- 
ferences in speed of reading is at present 
unsettled. The investigations show contra- 
dictory findings. Berman and Bird' admin- 
istered the Chapman—Cook Speed of Reading 
Test to 1095 subjects at the college level and 
found the women to be significantly superior 
to the men in speed of reading, although not 
superior in intelligence rating. Traxler* in a 
Reading Test to 256 boys and 283 girls at 
more recent study applied the Iowa Silent 
the high school level and found no statistically 
reliable differences, although the boys were 
slightly superior in speed of reading at the 
sub-freshman, freshman, and senior levels. 

This lack of agreement seemed to warrant 
a further study of the question of sex differ- 
ences in rate of reading which would include 
a sampling of students representing a wider 
range in educational status than that found 
in the former studies. The subjects included 
in this ‘nvestigation were chosen from each 
consecutive grade or class from the second 
year in junior high school through the senior 
year in college, inclusive. 

There were 2733 subjects in the present 
study, 1215 boys and 1518 girls. Of this num- 
ber 607 were college boys and 765 college 
girls. The entire eighth and ninth grades of a 
junior high school, the entire student body of 
a high school, and the senior class of another 
high school, all in Nashville, Tennessee, were 
tested. The college subjects were largely psy- 
chology and sociology students from the Uni- 
versity of North Carolina, the University of 
Idaho, North Carolina State College, Ashe- 
ville State Normal College, Louisiana Poly- 
technic Institute, and George Peabody Col- 
lege for Teachers. There were 100 or more 


* The interest and cooperation of the following individuals 
made this study ible: Mr. W. A. Bass, Superintendent, 
Nashville City Schools; Dr. E. A. Hoskins, Asheville State 
Normal College, Asheville, North Carolina; ‘ : 
Brooks, University of North Carolina, Chapel Hill, North 
; Professor William McGehee, North Carolina State 
. Raleigh, North Carolina; Dr. William H. Boyer, 
University of Idaho, Moscow, Idaho; and Professor Aubrey 
Bickley, Louisiana Polytechnic Institute, Ruston, Louisiana. 

‘Isabelle Berman and Charles Bird, “Sex Differences in 
Speed of Reading,” Journal of Applied Psychology, XVII 
(1933), 221-26. 

? Arthur E. Traxler, “Sex Differences in Rate of R in 
the See Seal,” Journal of Applied Psychology, x 
( ; -52. 


subjects of each sex at each class level except 
in the cases of eighth grade boys (79) and 
junior (92) and senior (64) college men. 


The sampling at the college level was not 
entirely satisfactory because, in order to have 
in each group at least one hundred subjects 
of each sex, students from a college for girls 
had to be compared with students from a 
college for men. 


Procedure —T he measuring instrument 
used was the Van Wagenen Rate of Compre- 
hension Test, Form B, which is part of the 
battery known as the Unit Scales of Apti- 
tude.* This test was selected because it con- 
tained a rather large number (56) of para- 
graphs of approximately equal difficulty and 
allowed a relatively short time (five minutes) 
for actual testing. The practice exercises were 
easily understood and offered splendid moti- 
vation for the main test, the scoring was ob- 
jective, and the type of material was sufi- 
ciently simple for all the poor readers to 
complete at least one paragraph, yet difficult 
enough to measure all except nineteen of the 
subjects who completed correctly the entire 
test before the time limit was up. 

This test is quite similar to the Chapman- 
Cook Speed of Reading Test in that there is 
one word which denies the thought in each 
paragraph. The investigator personally gave 
or supervised the giving of approximately 
fifteen hundred of the tests. All the tests were 
administered by experienced testers. 

The data presented in Table I show the 
number of boys and girls at each grade or 
class level included in the study, the median 
and mean number of paragraphs read, the 
standard deviation, the standard error of the 
mean, the significance of skewness, and the 
range of paragraphs read. 

The most striking fact which appears from 
these data is the consistent superiority of the 
girls in number of paragraphs read up until 
the junior year of college. The boys in only 
the junior and senior college classes surpass 
the girls in reading speed. 


*M. J. Van W: , Unit Scales of Aptitude. Minneapol 
Educational Test Bureau. 
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The standard error of skewness indicates 
that the departure from normality (Sk. 00), 
although great in several cases, is statistically 
significant only in the scores of the eighth 
grade girls and in the scores of the total 
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groups of both sexes. In three instances the 
distributions were skewed negatively, in on¢ 
instance favoring the girls and in the other 
two favoring the boys. The significance 
skewness for each of the combined groups jp. 
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dicates that there is a greater piling up of 
low scores among the girls than is the case 
among the boys. The comparatively negative 
skewness of the girls’ scores would seem to 
make differences in their favor more sig- 
nificant. 

The range of scores indicates a marked 
variation in the performance of subjects of 
both sexes. From Table I it will be noticed 
that the score of one eighth grade girl 
equalled the highest score made by a girl in 
the college freshman class. 

The standard deviation of the distributions 
presented in Table I indicates that in general 
the girls show more variation in reading speed 


than do the boys. The range of scores also” 


supports the same conclusion. The greater 
range of scores for the girls is not confined to 
the lower grades, but in these data is equally 
divided between the junior and senior high 
school grades and the college classes. 

Figure 1 shows more clearly the sex differ- 
ence in the average number of paragraphs 
read correctly. 

The most striking fact revealed in Figure 1 
is the uniformity of the curve showing the 
girls’ rate of reading, extending to the junior 
year in college. In contrast to the smooth 
gradual increase in the girls’ speed of reading 
is the irregular curve showing the boys’ read- 
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ing speed. If the sampling in this study can 
be considered representative of the student 
population from which it is drawn, it indi- 
cates that average reading speed of the girls 
is consistently superior to that of the boys 
from the eighth grade in high school through 
the sophomore class in college. 

The apparent difference which can be seen 
in Figure 1 should not be taken as real until 
shown to be statistically reliable. In Table II 
are presented the mean number of paragraphs 
read by the subjects of both sexes at each 
grade level, the differences between the means, 
and the reliability of the differences. 

It appears from Table II that only four of 
the obtained differences are reliable statisti- 
cally, and all of these favor the girls. The dif- 
ferences favoring the girls at the high school 
level in the eleventh and twelfth grades, 
although not statistically reliable, have a crit- 
ical ratio of more than two and indicate that 
the difference should probably be greater than 
zero in their favor ninety-nine times out of a 
hundred. The only statistically reliable differ- 
ence appearing at the college level favors the 
sophomore girls. This difference at the soph- 
omore level is quite similar to that found by 
Berman and Bird‘ in testing a group of 463 
sophomores. When all the male subjects are 

* Berman and Bird, op. cit. 


TABLE II 


A COMPARISON OF THE MEAN NUMBER OF PARAGRAPHS READ BY THE SUBJECTS OF BoTH SEXES 
AT Eacu SUccCESSIVE CLASS LEVEL FROM THE EIGHTH GRADE THROUGH THE SENIOR YEAR 
IN COLLEGE, WITH THE STANDARD ERROR OF THE DIFFERENCE, THE CRITICAL RATIO, AND 
CHANCE IN ONE HUNDRED THAT DIFFERENCES ARE SIGNIFICANT 


Grade Sex Mean 
a 21.24 
rm Girls 24.33 
Ninth i ciksiissiatab alias 24.03 
Girls 25.68 

Tenth a eoeaediaa 23.21 
Girls 26.72 

Eleventh ____.._.._______ Boys 24.55 
Girls 28.08 

i 28.86 
Girls 29.45 

eT Boys 29.64 
; Girls 29.94 
Sophomore ............_ Boys 28.51 
; Girls 30.63 
Junior a 32.91 
Ye Girls 30.24 
ee Ee Boys 32.19 
—_ Girls 30.90 
ombined Total ________ Boys 27.81 
Girls 29.80 


* Differences favoring the boys. 


Difference Chances 
of Mean SD/Diff. C. R. in 100 
3.09 1.11 2.78 100 
1.65 .30 5.55 100 
3.51 1.59 2.21 99 
3.53 1.06 3.30 100 
59 .24 2.46 99 
30 .24 1.24 89 
2.12 25 8.41 100 
2.67* 1.23 2.17 98 
1.29* 1.23 1.05 85 
1.99 34 5.80 100 
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compared with all the female subjects in the 
present study, the difference in reading speed 
favors the girls and is statistically reliable. 

The mean number of paragraphs read by 
the girls in the junior and senior college years 
changed little, but that read by the boys was 
markedly superior to their freshman and 
sophomore contemporaries. This abrupt in- 
crease in the number of paragraphs read by 
the college men seems to support the idea of 
selectivity of sampling. 

No data are presented in this study to show 
the exact amount of overlapping of reading 
scores at each grade or class level. By inspec- 
tion of Tables I and IT it can be seen that a 
marked degree of overlapping occurs. 

Summary of Results ——Girls appear to be 
consistently more rapid readers than boys at 
each grade level from the eighth grade in 
junior high school through the sophomore 
year in college. This superiority is apparent 
even though the sampling of girls was more 
positively skewed than was that of the boys. 
The difference between the means in the num- 
ber of paragraphs read by the boys and by 
the girls was statistically reliable in only four 
of the nine comparisons, but all four instances 
favored the girls. When all the girls’ scores 
are combined into one distribution and com- 
pared with the combined scores of the boys, 
the mean score of the girls excels that of the 
boys and the difference is statistically reliable. 
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Boys surpassed girls in the number of par- 
agraphs read at the junior and senior years in 
college, but this difference was not great 
enough to be statistically reliable. 

The data of this study seem to confirm the 
general findings of Berman and Bird® who 
reported a difference in favor of the girls ir 
reading speed. 

The portion of this study which deals with 
high school students is not in agreement with 
the findings reported by Traxler.° The test 
used by Traxler differed greatly from the one 
used in this study and may account for the 
dissimilar results. That reading speed is in- 
fluenced by the type of test administered has 
been shown by Tinker.” 

The sex differences revealed in this study 
while apparent, were not marked enough to 
warrant the introduction of separate standards 
or requirements in reading for girls or for 
boys within the grade range included here. 

The fact that nineteen of the subjects in 
this study read entirely through the 56 para- 
graphs correctly within the time limit alloted 
seems to indicate that the time limit might be 
shortened when the Van Wagenen Rate of 
Comprehension Test is used for college 
students. 


5 Berman and Bird, op. cit. 


® Traxler, op. cit. 


‘Miles A. Tinker, “The Relation of S to Comprehen 


sion in Reading,” School and Society, XXXVI (1932), 158-6 

















RELATIONSHIP OF FUSION WEAKNESS 
TO READING DISABILITY 


GaILe H. Goop 


Principal, Edison School 
Eugene, Oregon 





A great number of attempts have been 
made to isolate definite factors which might 
have a significant correlation with reading 
disability. A review of the literature on the 
subject does not yield evidence of any par- 
ticular physical or mental (intelligence elim- 
inated) status as a constant dominant factor. 
Previous research relative to vision as a factor 
in learning to read has not been able to point 
toward any conclusive evidence. However, 
the work of Paul Fendrick (1) in his Doctor’s 
Dissertation on “Visual Characteristics of 
Poor Readers”, that of Emmett Albert 
Betts (2), and others exhibit implications 
which require further attention. 

Myopia* (near sightedness) , hyperopia (far 
sightedness), astigmatism, and heterophoria 
have been considered. These, as they were 
found in large numbers of cases of both good 
and poor readers, do not account for any 
great percentage of reading disability. Also, 
since the factors of hyperopia, myopia, and 
astigmatism are correctable with lenses, they 
cannot be considered as constant. The con- 
dition of marked imbalances, though, may be 
a factor contributing toward retardation. This 
implication is easily seen in the results of the 
work by Betts (4), Selzer (5), Fendrick (6), 
and Eames (13). In cases where muscle im- 
balances are highly pronounced they may pro- 
duce deviations of retinal images sufficient to 
cause double or blurred vision. Here the 
reader must compensate for this condition, 
hence be handicapped in learning to read. 
Compensation apparently results in fatigue, 
strain, nervousness, or confusion in the indi- 
vidual. It should be noted that in the condi- 
tion of imbalance where duction is normal, 
fusion of the retinal areas is nearly complete, 
or less so, according to the degree of imbal- 
ance. It logically follows that the less com- 
plete the fusion, the greater the need for 
compensation. This corollary is evident: the 
greater the duction strength of the recti 
muscles the greater the possibility for forced 


*A glossary of technical terms may be found at the end 
Mf this discussion. 


II 


fusion when called upon; or conversely, the 
weaker the duction muscles, the less possible 
for them to force fusion, hence the need for 
compensation. 


One cannot assume that there may be per- 
fect fusion of the two retinal areas at reading 
distance if the eyes have no imbalance. ‘There 
is a condition known as asthenic orthophoria 
(weak duction power of the recti muscles). 
This duction power, especially adduction and 
abduction, must be recognized as definitely 
essential for good vision. The diagnosis be- 
tween sthenic and asthenic orthophoria must 
be made before conclusions can be drawn 
that muscular conditions of the eyes are not 
dominant factors in retarding reading. 


For purposes of clarity in subsequent dis- 
cussion of findings and conclusions in this 
study, the following description relative to 
imbalance and fusion may be pertinent. It is 
well known that binocular single vision is 
dependent upon the image striking corre- 
sponding retinal points. The macula or fusion 
area of the one eye must correspond point for 
point with the fusion area of the other eye. 
The vertical meridians of the two eyes must 
everywhere correspond and the same relation- 
ship between the horizontal meridians must 
exist. Diplopia will result if the two images 
of an object looked at fall on parts of the 
two retinae that do not correspond. There 
are but two ways of preventing double vision 
when this unnatural condition of the eyes 
exists and that is by forced fusion or by 
mental suppression of one image. The normal 
function of the extrinsic muscles of each eye 
is so to relate the two retinae as to prevent 
double vision. 

In addition to that of aiding in holding the 
eyes in balance, the recti muscles have an- 
other function. The interni control conver- 
gence of the two eyes (adduction), the externi 
control divergence (abduction). A widely ac- 
cepted standard lifting power, expressed in 
prism degrees for adduction, is 15 to 25, and 
abduction from 5 to 8. Any fusion accom- 
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plished with a variation less than this stand- 
ard would be done under strain to these 
muscles. 

The problem of this study was to ascertain 
the existing relationship between muscle 
fusing power and reading disability. The 
study has also taken into account, for pur- 
poses of verification, the ocular factors of 
myopia, hyperopia, astigmatism, and _ heter- 
ophoria. 

The fifty-five cases utilized were selected 
from two elementary schools in the Eugene 
school system, Eugene, Oregon, and one ele- 
mentary school in Springfield, Oregon. The 
three schools may be considered adjacent. 
Twenty-five pupils drawn for the experimen- 
tal group were definitely clinical reading 
cases.' Their reading status was obtained 
from each respective teacher and principal of 
the school. Each case was given an oral read- 
ing examination by the writer to verify the 
reading condition. Finally, for each case a 
reading score was secured by means of Form 
“A”, Progressive Test Series (14). Norms 
which have been established for this series 
are equivalent to those in general use in 
standardized achievement tests. Cases were 
arbitrarily limited to those whose intelligence 
quotient registered above 88 by means of the 
Stanford Revision of the Binet Simon Intelli- 
gence Test. 

Twenty-five children used in the control 
group were selected from those whose school 
record, verified by the teachers, exhibited no 
difficulty in learning to read. Tests from the 
same reading series were used to ascertain 
their reading ability. The intelligence ratings 
of these pupils were secured by means of the 
same tests as.used for the experimental group. 
Each case in the control group approximately 
matched one case of the experimental group 
in age, sex, school experience, and intelligence 
quotient. 

A third group of five was studied which was 
representative of those who in the past showed 
reading difficulty, yet at present were doing 
fairly satisfactory reading work. 


All of the ocular testing was done by a 
professional eye specialist in his own labora- 
tory with his standard equipment. Through 
his interest in the project and willingness to 
cooperate in the study, Dr. Gavven Dyott, of 


Eugene, Oregon, generously offered his time, 
1 Cases were considered ‘‘clinical’’ when progress depended 
upon individual help. 
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laboratory, and equipment to examine al] 
cases. Each child was subjected to tests of 
distant vision, astigatism, imbalance, adduct 
lifting power, and abduct lifting power of the 
eyes. 

A study of Tables I, II, III, and IV re. 
veals many significant factors: 


A. Each pupil possessed normal or nearly 
normal distant vision. 

B. Astigmatism found in the three groups 
was comparable in degree. Few were 
completely free from astigmatism. 
Four cases had marked astigmatism. 

C. Many cases of imbalance were evident 
in all groups. However, the greatest 
degree of imbalance favored the experi- 
mental group. The tables show ten 
cases with marked imbalance in the ex- 
perimental group, none in the control, 
and two in the unmatched group of five. 
Imbalance was considered marked when 
3 p.d. (prism diopters) or more were 
necessary for correction. 

D. Twelve cases in the experimental group 
possessed mo marked astigmatism and 
no imbalance. 

E. Each case in the experimental group 
showed a decided weakness in adduc- 
tion and abduction. Their adduction 
power ranged from 2 P.D. to 8 PD. 
with an average of 4.2 P.D. right, and 
4.1 P.D. left. 

F. The average in the control group for 
adduction was 14.7 P.D. right and 14.4 
P.D. left. The average abduction 
measured 4.7 P.D. right and 4.5 P.D. 
left. 

G. Table IV clearly reveals the following 
facts: 

(a) The weakness of the adduct and 

abduct muscles is consistent in ex- 
perimental cases of one yea 
through seven years of school 
experience. 
With the increase of the variable, 
number of years of school exper! 
ence, there is a persisent increase 
in difference between the reading 
status of the experimental group 
and that of the control group. 


H. Sixty per cent of the cases in each 0 
the experimental and control groups 
were found to possess refractive errors 


(b) 
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4. M 10-4 94 6 2 2 20 20 “75 .50 @P.0. 2 2 
Hyper 2 2 
6 vo 8-11 96 7 1 1 20 20 2 2 
20 20 2 2 


SYMBOLS:- INTE@PRETATION OF SYMBOLS IW CHARTS 
AST. aSTIGmaTISm 
Ad. adOUCTION 
An. aBoucTioNn 
C.A, CHROwOLOGiCcar AcE 
1.Q. INTELLIGENCE Quotient 
Yes. OF Exp. Years OF Experience 
R.V. Reading Vocasucary 
R.C. Reaoinc ComPRewewsion 
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: C.A. 1.Q. YRS READING TEST DISTANT VISION ASTIG. \v— DUCTION 
oF RV. R.C. BALANCE AD Ad 
£ xf AB LE 
® L 
F 6-3 105 1 3.6 2.5 20 MARKED if ‘ 
20 2.25%90 «70*90 > 34 
u 6-4 105 1 6 27.6 29 15 15 
20 é 
F 109 1 2.3 1.3 20 20 15 15 
5 a5 5 5 
v7 107 i 1.7 o3 22 20 16 «615 
15 15 4 4% 
M 7-3 113 + 4.5 367 20 14.5 25 
20 5 
f “5 103 2 3.3 4.3 20 14.5 14 
20 5 € 
“ 75 114 2 3.5 4.6 20 16 = 15 
20 + 4.* 
F "72 109 2 3.6 3.2 20 2.25*90 160s 
20 i?) v 
v "7-5 121 2 3.5 3.5 20 -76*90 +25*90 i5 «it 
20 s ‘ 
v 7-9 103 2 3.6 4.0 20 1H 8015 
20 5 : 
M 7-9 116 2 5.5 4.6 29 3790 -62*90 SLIGHT 15 
20 es 5 
M 710 106 2 3.7 4.1 20 16 «15 
20 4 5* 
M @e-2 128 2 47 5.8 20 SLIGHT ae? “23 
20 ES 5 4 
u e-2 101 7 4.5 4.5 20 .75*90 -'7&*90 14 14 
30 5 E 
M 8-10 18 3 3.8 5.4 20 15 14 
15 5 4 
M 100s «101 3 4.9 4.6 ) ae e 
4 4 
F gS 106 4 4.9 4.8 20 .12%90 .12*90 1 14 
20 + , 
M 9-5 104 4 4.9 4.8 20 2490 -«50*90 15 5 
20 
vw 10 120 4 4.9 5.6 20 16 hs 
20 44 4 
M 10-3 696 5 6.2 4.8 20 .75x90 «5090 SLIGHT 1s " 
ya) HreerR z4 4 
M 10-5 108 5 $.2 5.1 35 .50*%90 .50*90 15 
15 4 a 
“ 11 108 5 6.7 ” -25*90 .50*90 1 § 
5 4 
v 11-6 9% 5 £.8 5.2 20 37xgo .azxg0 5 P.O. 16 1 
20 Hreer t 5 
u 12-3 «103 6 5.7 6.0 20 1 P.D is 15 
Zo Ss 5 6 
uM 12-3 9 ” 6.0 6.2? 20 12"9q 12*90 15 5 
4 4 
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TABLE 111 
PAST CLINICAL CASES 

Nos. SEX C.A. 1.Q. YRS. READING TEST DISTANT VISION ASTIG. IM— DUCTION 
OF RV. R.C. BALANCE AD AQ 
EXP. AB AB 

i L 

Mo 7=9 116 2 2.5 3.0 20 20 NONE HYPER 8 6 
20 20 SLIGHT 3.5 3.5 

2 M 9-3 120 4 4.0 3.0 20 20 SLIGHT bake 8 4 
20 20 § P.D. 4 4 

Mo 9-9 = 6118 4 5.5 5.6 20 20 1x90 .67x90 HYPER 9 9 

20 20 34 P.0. 3 3 

4 M 12-2 121 7 5.9 9.0 20 20 NONE HYPER 4.5 4.5 
20 20 34 P.O. 4 4 

5 M 12-1 120 6 5.9 6 20 20 .37%90 .25X90 NONE i 8 
15 15 2 2 


TABLE 


||| SHOWS THE CONDITION FOUND IN FIVE PAST CLINICAL’ CASES, 


CASES ALL ARE NOW DOING FAIRLY SATISFACTORY WORK IN READING. 
NuMBER ONE HAD TROUBLE BEGINNING FIRST GRADE AND NEEDED INDIVIDUAL HELP UNTIL THE LAST PART 


OF HIS FIRST YEAR, 
NuMBER TWO 


DOING SATISFACTORILY. 


NuMBER FOUR HAS BEEN A CLINICAL CASE FOR FOUR YEARS, 


AGE TO A FAIR DEGREE. 
NuMBER FIVE HAS BEEN OOING WELL AT TIMES, 
ING TO HIS TEACHER'S REPORT. 


[. Forty per cent of the experimental and 
20% of the control cases possessed 
ocular imbalances of some sort, the 
greater degree of imbalance favoring 
the experimental cases. 

J. Ninety-two per cent of the experimental 
cases measured less than 8 prism diopt- 
ers of adduction, while none of the con- 
trol cases registered less than 8 prism 
diopters. 


There has been no attempt in this study to 
determine the duction power necessary for 
binocular single vision at reading distance 
without need of strain or discomfort. The 
tables do reveal the fact that the poorest 
readers (imbalance and duction degree com- 
binations not considered) have the least duc- 
tion power; the better of the clinical readers 
have greater duction power; that all good 
readers possess a normal duction power. 


Table IIT of the five past clinical cases shows 
the condition of the eyes in this group. 


BUT tS ERATIC IN HIS READING ACHIEVEMENT, 


1S WOW IN HIS REGULAR GRADE BUT 1S CONSIDERED WEAK. 
NuMBER THREE NEEDED INOIVIDUAL HELP THROUGH HIS FIRST AND SECOND GRADE WORK, 


He «tS NOW 
IS STILL QUITE SLOW BUT ABLE TO MAN— 


ACCORD— 


It will be noted that all have an adduction 
power of 8 to 9 P.D., except two who broke 
at 4.5 P.D., but had a suspending power to 
12 P.D. forced adduction. This information 
is significant in the light of the conditions 
found in both clinical and control groups. 
It is apparent, therefore, that reading can be 
accomplished with less than 15 P.D. adduc- 
tion. It is very evident, however, from this 
study that less than 8 P.D. adduction be- 
comes a disturbing factor, and more so as the 
adduction power decreases. 


Previously in this paper reference was 
made to imbalance, and it was pointed out 
that there may be some relationship between 
imbalance and reading disability. Referring 
to Table I of experimental cases numbers 1, 
2, 8, 14, 15, 17, 21, and 24, one observes com- 
binations of marked imbalance with adduc- 
tion power from 5 P.D. to 8 P.D. The accom- 
panying reading condition in each case was 
extremely weak. Two of the cases, numbers 3 
and 13, scored zero or nearly so. Other com- 
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AGGREGATE DATA OF CLINICAL AND CONTROL CASES 
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2 3 S 2¢ ez. 8833 $82.0 «5: 
« ” a a > wt a< z a< =) w D> 
> o wo « eS -«s o - oa < z= xr<a © - 

. > « ww w & = = > oOo ® -ae *Qro@ >o 
fo) o- > o > . w @ 2 22a >z-o0ao >z-s > ze 
=z oa o <a < <«<-_ a<a se a < Fw =a<« = - 4-7 

| BIBIE C E E C E Cc E Cc E Cc E x 
RV RC RV RC R_L{[R L R LIR a 
1. 2 2 6.6 6.6 105 108 6.1.9 1.3 2.6 2.3% 2 fe) . 37 @ o 4 5 19 15 24 
4 a 1 2 5 4t 
2. 97 2 F464 %F6 109 112 2.3 2.2 3.8 3.9 4 . = 3s © 4.5 6 14.7 14.4 § 
2 2 3.5 3 4.9 4.9 
3. 3% 0 8.3 9 96.6 100 1.1 1.2 4.4 4.2 1 ®* @ 2 ee 0 2.6 2.5 13 13 z 
it i+ 2 2 48 4.3 
4- 2 1 9.3 9.3 106% 109 3.1 2.6 4.9 5.1 1 o 4 8 0 © 2.7 2.2 15 13.6 2 3 
4 4 1.51.54 3.5 
5. 4 0 10.8 10.8 102 101 2.8 2.9 5§.8 5.5 2 2 3.5 4135 18 § $3 18 18 2 3 
2.7 2.7 6 > 2 4 4 
6. 1 012.8 12.3 68 103 3.4 2 5.7 8.0 fe) 1 fe) 015 15 2.5 2.5 i 
5 5 2 2 
7. 1 @©12.7 12.3 92 99 4.1 4.4 8.0 8.2 0 fe) 0 ° 0 03.5 3 15 15 1 
2 2 4 4 
AVERAGE 1.0. EXPERIMENTAL 104 
AVERAGE |.Q. CONTROL 106 
AVERAGE ADOUCTION EXPERIMENTAL RIGHT 4.2 LEFT 4.1 ConTrRor R. 14.7 L 14.4 
AVERAGE ABDUCTION ExpERIMENTAL RIGHT 2.4 LEFT 2.4 ConTror R. 4.9 L 4.5 


binations of like nature occur and are worth 
further study, but the purpose of this paper 
is merely to point out the more significant 
factors and implications of the study. 

One is forced to conclude from the scope 
of this research that adduction and abduction 
weaknesses definitely accompany difficulty in 
learning to read. Retests following corrective 


ocular treatment for increasing duction 
strength show marked improvement in 
reading. 

GLOSSARY 


asthenic, absence of strength. 

astigmatism, that condition of the eye in 
which rays of light from a point do not 
converge to a point on the retina. 

binocular, pertaining to both eyes. Binocular 
vision is the faculty of using.both eyes syn- 
chronously and without diplopia. 


duction, a colloquialism used to represent one 
or more of the terms abduction, adduction, 
or sursumduction. 

extrinsic muscles, referring to the external 
muscles of the eyes. 

heterophoria, a relation of the visual lines 0! 
the two eyes other than that of parallelism 

imbalance, lack of muscular balance of the 
eyes. 

macula, macula lutea is the point of cleares 
vision. 

ortho phoria, 
muscles. 

recti muscles, referring to rectus externvs 
and rectus internus. 

sthenic, strong. 


normal balance of the ev 
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